逐步实现LLVM下的自动软加固工具

指令复算是软件容错的基本方法，自动软加固工具和故障注入工具在软加固研究中几乎必不可少，本文随笔者工作进度，逐步完成一个利用LLVM实现的自动软加固工具。将会记录实现面临的关键技术，及开发过程中遇到的细节问题。

1 如何插入一条指令

参考：https://stackoverflow.com/questions/35198935/add-an-llvm-instruction

BasicBlock *B = I->getParent();
if (auto *op = dyn_cast<BinaryOperator>(&*I))
{
    auto temp = op->clone();
    B->getInstList().insert(op, temp);
    temp->setName(op->getName());
    op->replaceAllUsesWith(temp);
}

其中，I是Instruction，BinaryOperator是Instruction的子类。

各种不同的类型的指令，其插入方式各有不同，上面是针对所有指令的直接clone()，该操作除了没有复制指令所在的位置以外，全部copy原始指令的信息，包括use信息，这是实现复算的最便捷的方式，但我们仍然需要插入一些比对、跳转、函数调用等指令，这些指令大多都提供了相应的Create方法，只要有相应的示例，学习起来并不困难。

CmpInst

1 2	Value newOp = it->second; CmpInst cmp = CmpInst::Create(Instruction::ICmp, CmpInst::ICMP_NE, it->first, it->second, "eddi_check", inst);

CallInst

无参数的call

BB->getInstList().insertAfter(BB->begin(),newret);
FunctionType *error_handle_func_type = FunctionType::get(Type::getVoidTy(context), false);
Constant *error_handling_func = F.getParent()->getOrInsertFunction("error_handling",
                                                                   error_handle_func_type);
CallInst::Create(error_handling_func, "", BB->begin());

有参数的call

std::vector<Type*> parameterVector(1);
parameterVector[0] = Type::getInt32Ty(context); //ID
ArrayRef<Type*> parameterVector_array_ref(parameterVector);

FunctionType *exit_func_type = FunctionType::get(Type::getVoidTy(context),parameterVector_array_ref, false);
Constant *exit_func = F.getParent()->getOrInsertFunction("exit", exit_func_type);

Value *one = ConstantInt::get(Type::getInt32Ty(context),1);
std::vector<Value*> exitArgs;
exitArgs.push_back(one);

ArrayRef<Value*> exitArgs_array_ref(exitArgs);

//Create the Function
CallInst::Create(exit_func, exitArgs_array_ref, "", BB->end());

2 如何识别扫描到的指令的类型

参考：https://stackoverflow.com/questions/30250289/how-to-check-the-opcode-of-an-instruction

5down voteaccepted

isa is used to check for an existing dirived instruction class. class i.getopcode() could help you to get all the operations information.

According to the Inheritance diagram for llvm::Instruction,LLVM internally will divide all the instruction into several different classes, like llvm::BinaryOperator, llvm::CallInst, llvm::CmpInst, etc. But there is no exact operation information for these classes.

However, for Instruction::getOpcode(), it will directly get the operation from the llvm::Instruction object. You could refer to Instruction.def for an idea about defination of each instruction. Basically, the opcode will be the exact operations the instruction intends to.

Say, for an LLVM IR add. You can use isa<llvm::BinaryOperator>, to know that this is a BinaryOperator. But this is only for what the instruction class it is. If you want to know whether it is an ADD or a SUB. i.getopcode() should be used here.

上面给出了几个很好的总结。

Instruction 有多少子类

在http://llvm.org/doxygen/classllvm_1_1Instruction.html 中，有Instruction的继承关系，可以看到Instruction所有的子类
查询每个指令具体属于的操作码，如对于BinaryOperator，具体是ADD还是SUB

https://llvm.org/svn/llvm-project/llvm/trunk/include/llvm/IR/Instruction.def

但目前并不知道如何使用这些信息。

使用isa\确定属于哪个子类，使用i.getopcode()确定是子类中的哪个操作

isa(i) and isa(i) without changing to if (i.getopcode()==…)

3. 如何为指令插入Metadata

插入字符串类型，借助MDString类，以下为获取每条指令的操作码的名字

Value *Elts[] = {
  MDString::get(context, inst->getOpcodeName(inst->getOpcode()))
};
MDNode *Node = MDNode::get(context, Elts);
temp->setMetadata("mxk", Node);

插入数字类型

std::vector<Value*> llfiindex(1);
llfiindex[0] = ConstantInt::get(Type::getInt64Ty(context), fi_index++);
MDNode *mdnode = MDNode::get(context, llfiindex);
inst->setMetadata("mxk", mdnode);

删除一条指令的Metadata

1 2	Instruction *temp = inst->clone(); temp->setMetadata("llfi_index", NULL);

4. 如何更新复算指令的引用

为实现代码复算，我们需要为复算指令增加引用，例如：

1 2	%mul = mul nsw i32 %8, %10 原始指令 %mul10 = mul nsw i32 %9, %11 副本指令，其use的数据都要更新为原始指令中use数据的副本

为此，我们需要获取每个每个副本指令中需要更新的use数据的信息。

在llvm中，由于采用SSA，所有的数据只会被赋值一次，而且，在数据结构上，所有的数据指针(Value )都是一个指向该数据的指令(Instruction )，注意：Instruction继承自Value和Use。

所以，解决更新副本指令引用的关键，就是找到每个指令所use的数据的副本数据。我们在每条指令的复算过程中，都将原始数据指针（也就是原始指令指针）与副本数据指针（也就是副本数据指针）之间的映射关系存放到map中，这样，我们只需要找到每个指令的所有引用就足以解决问题了。

在https://www.zhihu.com/question/41999500中，给出了很清晰的关于use-def在llvm中如何存储的描述。

llvm的每条指令都存储了def-use信息，结构如下：

对于一个Instruction inst，可以获取其operator的iterator，并且通过for循环遍历之。

1
2
3

for (User::op_iterator opIterator = temp->op_begin(); opIterator != temp->op_end(); opIterator++) {
  //....
}

同时，Use类中，还有一个方法：

1
2
3

public:
  /// Normally Use will just implicitly convert to a Value* that it holds.
  operator Value*() const { return Val; }

也就是说，Use的值会自动转换成一个Value的指针。

所以，我们可以通过opIterator来获取当前指令的所有引用信息。

实现复算和更新副本指令引用的代码如下：

BasicBlock *B = inst->getParent();

Instruction *temp = inst->clone();
temp->setMetadata("llfi_index", NULL);
shadow_value_map.insert(std::make_pair<Value*, Value*>(inst, temp));

std::vector<Value*> Elts(1);
for (User::op_iterator opIterator = temp->op_begin(); opIterator != temp->op_end(); opIterator++) {
  //Use *newOp = opIterator->getNext();//the operator's next value should be the duplicated instruction
  Value *curOp = *opIterator;
  std::map<Value*, Value*>::iterator it = shadow_value_map.find(curOp);
  if (it != shadow_value_map.end()) {

    Value *newOp = it->second;
    *opIterator = newOp;
    Elts.push_back(MDString::get(context, newOp->getName()));
  }
  //*opIterator = *newOp;
}
MDNode *Node = MDNode::get(context, Elts);
temp->setMetadata("mxk1", Node);

B->getInstList().insertAfter(inst,temp);
// inst->replaceAllUsesWith(temp);
temp->setName(inst->getName());

5.复算的范围

最佳实践是除了terminitor以外，全部复算，这样就与EDDI算法的核心思想相吻合，即，对所有的指令、寄存器、内存都进行了复算。然而，由于某些指令的执行后果的不确定性，导致这样做会出现一些问题。

landingpad指令

这是LLVM中的异常处理机制，在复算检验实现过程中，曾出现过如下异常信息

LandingPadInst not the first non-PHI instruction in the block.
  %21 = landingpad { i8*, i32 } personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*)
          cleanup, !mxk1 !2, !mxk !64, !numuses !4
Broken module found, compilation aborted!
0  opt             0x00000000018707fa llvm::sys::PrintStackTrace(_IO_FILE*) + 53
1  opt             0x0000000001870a8a
2  opt             0x0000000001870453
3  libpthread.so.0 0x00007f3b2999f390
4  libc.so.6       0x00007f3b28d58428 gsignal + 56
5  libc.so.6       0x00007f3b28d5a02a abort + 362
6  opt             0x000000000175e9eb
7  opt             0x000000000175e69f
8  opt             0x0000000001734ff8 llvm::FPPassManager::runOnFunction(llvm::Function&) + 330
9  opt             0x000000000173516e llvm::FPPassManager::runOnModule(llvm::Module&) + 120
10 opt             0x000000000173546e
11 opt             0x0000000001735966 llvm::legacy::PassManagerImpl::run(llvm::Module&) + 262
12 opt             0x0000000001735b3f llvm::legacy::PassManager::run(llvm::Module&) + 39
13 opt             0x00000000008f3828 main + 5804
14 libc.so.6       0x00007f3b28d43830 __libc_start_main + 240
15 opt             0x00000000008e4ef9 _start + 41
Stack dump:
0.	Program arguments: /home/xiaofengwo/llvm/llvm-3.4/build/bin/opt -load /home/xiaofengwo/llvm/ir_sihft_llvm_build/bin/../llvm_passes/llfi-passes.so -insttracepass -maxtrace 250 -o /home/xiaofengwo/llvm/llvm-workspace/sample_programs/kakadu_source_flat/llfi/a-profiling.ll /home/xiaofengwo/llvm/llvm-workspace/sample_programs/kakadu_source_flat/llfi/a-llfi_index.ll -S
1.	Running pass 'Function Pass Manager' on module '/home/xiaofengwo/llvm/llvm-workspace/sample_programs/kakadu_source_flat/llfi/a-llfi_index.ll'.
2.	Running pass 'Module Verifier' on function '@_ZN13kdu_synthesisC2E14kdu_resolutionP20kdu_sample_allocatorbf'
Aborted (core dumped)

landingpad 貌似只能是每个基本块的第一条non-PHI指令。因此，我们把landingpad暂时移出复算域。

call指令

如果对call指令也进行2遍调用，会在执行的时候报段错误，可能是由于每调用一次都产生两遍副本，对于最底层的函数调用次数过多导致的。

如果对call指令仅调用1遍，而后对其define的副本进行同步，则有可能导致call指令引用的变量所指向的内存单元仅在原始变量中是有效的，即如果call指令所引用的变量是指针，那么调用过call指令后，副本指针所指向的位置可能没有值。但由于副本指针和原始指针指向同一位置，如果有对这块地址的修改操作，恐怕会有问题。

在EDDI算法中，也无法解决call指令的问题，无论是调用之后再进行同步，还是调用call指令2遍，都不能自动化实现对于黑盒函数调用之后的数据一致性问题。

Todo：考虑对函数进行分析，确定函数是否可以被调用2遍的角度，实现真正的EDDI算法。
StoreInst和AllocaInst

在EDDI算法中，应该对二者进行复算，但由于call指令的潜在威胁，暂时无法实现完全的EDDI算法，只能实现针对寄存器的SWIFT算法。StoreInst 和 AllocaInst 指令不参与复算。

然而，目前没有对AllocaInst和CallInst之后的返回值进行复算，这样，仍然有部分变量漏掉了，应该在这些指令之后加入一条赋值指令，但目前没有找到这样的指令，考虑增加一条BinaryOperator指令来完成赋值。

Todo：在AllocaInst和CallInst后加入对返回值的复算。

6. 检查校验

对于SWIFT算法，校验点插在StoreInst和CallInst，以及branch指令前。

校验点包括比较和跳转，比较指令比较下一条存储或者函数调用将要用到的变量，跳转指令根据比较结果，选择继续执行还是跳转到最后。

比较指令

比较指令分为ICmpInst和FCmpInst，要按照操作数的类型而调整使用的比较指令类型。

if (it->first->getType()->isIntOrIntVectorTy() || it->first->getType()->isPtrOrPtrVectorTy()) {
  // if (!isa<llvm::IntegerType>(it->first->getType())) {
  cmp = CmpInst::Create(Instruction::ICmp, CmpInst::ICMP_NE, it->first, it->second, "eddi_check", inst);

} else {
  cmp = CmpInst::Create(Instruction::FCmp, CmpInst::FCMP_UNE, it->first, it->second, "eddi_check", inst);

}

其中，it是操作数的迭代器。

然而，按照以上代码，在对br指令之前的副本变量进行校验的时候，会有部分指令报错

/home/xiaofengwo/llvm/llvm-3.4/build/bin/opt -load /home/xiaofengwo/llvm/ir_sihft_llvm_build/bin/../llvm_passes/llfi-passes.so -insttracepass -maxtrace 250 -o /home/xiaofengwo/llvm/llvm-workspace/sample_programs/kakadu_source_flat/llfi/a-profiling.ll /home/xiaofengwo/llvm/llvm-workspace/sample_programs/kakadu_source_flat/llfi/a-llfi_index.ll -S
Invalid operand types for FCmp instruction
  %eddi_check24 = fcmp une { i8*, i32 } %lpad.val3, %lpad.val323
Invalid operand types for FCmp instruction
  %eddi_check25 = fcmp une { i8*, i32 } %lpad.val3, %lpad.val323
Broken module found, compilation aborted!
0  opt             0x00000000018707fa llvm::sys::PrintStackTrace(_IO_FILE*) + 53
1  opt             0x0000000001870a8a
2  opt             0x0000000001870453
3  libpthread.so.0 0x00007f1b3f340390
4  libc.so.6       0x00007f1b3e6f9428 gsignal + 56
5  libc.so.6       0x00007f1b3e6fb02a abort + 362
6  opt             0x000000000175e9eb
7  opt             0x000000000175e69f
8  opt             0x0000000001734ff8 llvm::FPPassManager::runOnFunction(llvm::Function&) + 330
9  opt             0x000000000173516e llvm::FPPassManager::runOnModule(llvm::Module&) + 120
10 opt             0x000000000173546e
11 opt             0x0000000001735966 llvm::legacy::PassManagerImpl::run(llvm::Module&) + 262
12 opt             0x0000000001735b3f llvm::legacy::PassManager::run(llvm::Module&) + 39
13 opt             0x00000000008f3828 main + 5804
14 libc.so.6       0x00007f1b3e6e4830 __libc_start_main + 240
15 opt             0x00000000008e4ef9 _start + 41
Stack dump:
0.	Program arguments: /home/xiaofengwo/llvm/llvm-3.4/build/bin/opt -load /home/xiaofengwo/llvm/ir_sihft_llvm_build/bin/../llvm_passes/llfi-passes.so -insttracepass -maxtrace 250 -o /home/xiaofengwo/llvm/llvm-workspace/sample_programs/kakadu_source_flat/llfi/a-profiling.ll /home/xiaofengwo/llvm/llvm-workspace/sample_programs/kakadu_source_flat/llfi/a-llfi_index.ll -S
1.	Running pass 'Function Pass Manager' on module '/home/xiaofengwo/llvm/llvm-workspace/sample_programs/kakadu_source_flat/llfi/a-llfi_index.ll'.
2.	Running pass 'Module Verifier' on function '@_ZN13kdu_synthesisC2E14kdu_resolutionP20kdu_sample_allocatorbf'
Aborted (core dumped)

这就奇怪了，原本以为CmpInst只有ICmpInst和FCmpInst，但上面这种两种都不适用，就不知道怎么回事了。

只好暂时不管这些特殊的类型。

if (it->first->getType()->isIntOrIntVectorTy() || it->first->getType()->isPtrOrPtrVectorTy()) {
  // if (!isa<llvm::IntegerType>(it->first->getType())) {
  cmp = CmpInst::Create(Instruction::ICmp, CmpInst::ICMP_NE, it->first, it->second, "eddi_check", inst);

} else if (it->first->getType()->isFPOrFPVectorTy()) {
  cmp = CmpInst::Create(Instruction::FCmp, CmpInst::FCMP_UNE, it->first, it->second, "eddi_check", inst);

} else {
  continue;
}

跳转指令

插入跳转指令，会导致基本块结构的调整，在同一个pass里面，遍历时由于跳转，会破坏该结构，因此，暂时采用插入校验函数的方法，比较低效。

Todo：后期将加入重新对基本块进行调整的机制。

每次拆分基本块后，重新从头开始扫描指令可以解决问题，但十分低效，需要一种快速调整iterator并指向当前处理指令的方法。

可选的方法：

可用方法	是否已尝试	结果	分析
比对之后，结果存于cmp变量中，插入具有比对功能的函数调用	已尝试		方法简单，不需要重新划分基本块，不存在iterator失效问题，但执行时性能较差且出错窗口较大
比对之后，结果存于cmp变量中，在当前指令拆分基本块，并插入br error_detection指令	已尝试	报错，“Instruction does not dominate all uses!”	每次拆分基本块，会是iterator失效，只能重新获取iterator，效率很低。且函数尾部的error_detection标签所处基本块，原本为unreachable，被某处br到后，原本的ret语句不知为何，报错。
比对之后，结果存于cmp变量中，在当前指令拆分基本块，并插入if-then-else，在then中插入错误处理函数	已尝试	报错	每次拆分基本块，会是iterator失效，只能重新获取iterator，效率很低。该方法看似不会存在问题，但是仍然报错，怀疑是iterator的问题
比对之后，结果存于cmp变量中，在当前指令拆分基本块，并插入if-then-else，在then中插入到error_detection的无条件跳转	已尝试	报错，“Instruction does not dominate all uses!”	看来问题集中在ret指令要dominate all uses上

经过各种测试，问题集中在两个方面：

如何在iterator失效的情况下，相对高效地完成对每条代码的扫描
如何解决Instruction does not dominate all uses!问题
对于操作数的比对需要针对不同的数据类型，目前采用了Int，Double，Float，还需要考虑对于struct类型的比对，如果类型不匹配，会在正常执行流程中检测到错误，因此，缺少对一种类型的支持，就会损失很多检测点。

7. 后端编译对软加固代码的影响

无论是源代码级软加固还是中间代码级软加固，都面临一个难题，就是后端编译对软加固代码的破坏，这些破坏包括但不限于指令重排，寄存器分配等导致检测点位置，检测点语义（检测的对象）发生变化。后端编译对软加固的影响是巨大的，甚至可以使软加固完全失效。

例如，针对原始代码factorial.c

#include<stdio.h>
main(argc, argv)
int argc;
char *argv[];
{
  int i,fact, n;
  n = atoi(argv[1]);
  fact = 1;
  for(i=1;i<=n;i++)
  {
    fact = fact * i;
  }
  printf("%d\n",fact);
  exit(0);
}

其中间代码factorial.ll如下：

; ModuleID = 'factorial.ll'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@.str = private unnamed_addr constant [4 x i8] c"%d\0A\00", align 1

; Function Attrs: nounwind uwtable
define i32 @main(i32 %argc, i8** %argv) #0 {
entry:
  %retval = alloca i32, align 4
  %argc.addr = alloca i32, align 4
  %argv.addr = alloca i8**, align 8
  %i = alloca i32, align 4
  %fact = alloca i32, align 4
  %n = alloca i32, align 4
  store i32 0, i32* %retval
  store i32 %argc, i32* %argc.addr, align 4
  store i8** %argv, i8*** %argv.addr, align 8
  %0 = load i8*** %argv.addr, align 8
  %arrayidx = getelementptr inbounds i8** %0, i64 1
  %1 = load i8** %arrayidx, align 8
  %call = call i32 (i8*, ...)* bitcast (i32 (...)* @atoi to i32 (i8*, ...)*)(i8* %1)
  store i32 %call, i32* %n, align 4
  store i32 1, i32* %fact, align 4
  store i32 1, i32* %i, align 4
  br label %for.cond

for.cond:                                         ; preds = %for.inc, %entry
  %2 = load i32* %i, align 4
  %3 = load i32* %n, align 4
  %cmp = icmp sle i32 %2, %3
  br i1 %cmp, label %for.body, label %for.end

for.body:                                         ; preds = %for.cond
  %4 = load i32* %fact, align 4
  %5 = load i32* %i, align 4
  %mul = mul nsw i32 %4, %5
  store i32 %mul, i32* %fact, align 4
  br label %for.inc

for.inc:                                          ; preds = %for.body
  %6 = load i32* %i, align 4
  %inc = add nsw i32 %6, 1
  store i32 %inc, i32* %i, align 4
  br label %for.cond

for.end:                                          ; preds = %for.cond
  %7 = load i32* %fact, align 4
  %call1 = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([4 x i8]* @.str, i32 0, i32 0), i32 %7)
  call void @exit(i32 0) #3
  unreachable

return:                                           ; No predecessors!
  %8 = load i32* %retval
  ret i32 %8
}

declare i32 @atoi(...) #1

declare i32 @printf(i8*, ...) #1

; Function Attrs: noreturn
declare void @exit(i32) #2

attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #2 = { noreturn "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #3 = { noreturn }

!llvm.ident = !{!0}

!0 = metadata !{metadata !"clang version 3.4 (tags/RELEASE_34/final)"}

经过SWIFT算法软加固后的中间代码factorial_swift.ll如下（忽略其中的profiling）：

; ModuleID = '/home/xiaofengwo/llvm/llvm-workspace/sample_programs/factorial/llfi/a-llfi_index.ll'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@.str = private unnamed_addr constant [4 x i8] c"%d\0A\00", align 1

; Function Attrs: nounwind uwtable
define i32 @main(i32 %argc, i8** %argv) #0 {
entry:
  %retval = alloca i32, align 4
  call void @doProfiling(i32 26)
  %argc.addr = alloca i32, align 4
  call void @doProfiling(i32 26)
  %argv.addr = alloca i8**, align 8
  call void @doProfiling(i32 26)
  %i = alloca i32, align 4
  call void @doProfiling(i32 26)
  %fact = alloca i32, align 4
  call void @doProfiling(i32 26)
  %n = alloca i32, align 4
  call void @doProfiling(i32 26)
  store i32 0, i32* %retval, !storemark !1
  store i32 %argc, i32* %argc.addr, align 4, !storemark !1
  store i8** %argv, i8*** %argv.addr, align 8, !storemark !1
  %0 = load i8*** %argv.addr, align 8
  %1 = load i8*** %argv.addr, align 8, !mxk1 !2, !mxk !3, !numuses !4
  call void @doProfiling(i32 27)
  %arrayidx = getelementptr inbounds i8** %0, i64 1
  %arrayidx1 = getelementptr inbounds i8** %1, i64 1, !mxk1 !5, !mxk !6, !numuses !4
  call void @doProfiling(i32 29)
  %2 = load i8** %arrayidx, align 8
  %3 = load i8** %arrayidx1, align 8, !mxk1 !7, !mxk !3, !numuses !4
  call void @doProfiling(i32 27)
  %eddi_check = icmp ne i8* %2, %3
  call void @check_and_error_handling(i1 %eddi_check)
  %call = call i32 (i8*, ...)* bitcast (i32 (...)* @atoi to i32 (i8*, ...)*)(i8* %2)
  call void @doProfiling(i32 49)
  store i32 %call, i32* %n, align 4, !storemark !1
  store i32 1, i32* %fact, align 4, !storemark !1
  store i32 1, i32* %i, align 4, !storemark !1
  br label %for.cond

for.cond:                                         ; preds = %for.inc, %entry
  %4 = load i32* %i, align 4
  %5 = load i32* %i, align 4, !mxk1 !2, !mxk !3, !numuses !4
  call void @doProfiling(i32 27)
  %6 = load i32* %n, align 4
  %7 = load i32* %n, align 4, !mxk1 !2, !mxk !3, !numuses !4
  call void @doProfiling(i32 27)
  %cmp = icmp sle i32 %4, %6
  %cmp2 = icmp sle i32 %5, %7, !mxk1 !8, !mxk !9, !numuses !4
  call void @doProfiling(i32 46)
  %eddi_check3 = icmp ne i1 %cmp, %cmp2
  call void @check_and_error_handling(i1 %eddi_check3)
  br i1 %cmp, label %for.body, label %for.end

for.body:                                         ; preds = %for.cond
  %8 = load i32* %fact, align 4
  %9 = load i32* %fact, align 4, !mxk1 !2, !mxk !3, !numuses !4
  call void @doProfiling(i32 27)
  %10 = load i32* %i, align 4
  %11 = load i32* %i, align 4, !mxk1 !2, !mxk !3, !numuses !4
  call void @doProfiling(i32 27)
  %mul = mul nsw i32 %8, %10
  %mul4 = mul nsw i32 %9, %11, !mxk1 !8, !mxk !10, !numuses !4
  call void @doProfiling(i32 12)
  %eddi_check5 = icmp ne i32 %mul, %mul4
  call void @check_and_error_handling(i1 %eddi_check5)
  store i32 %mul, i32* %fact, align 4, !storemark !1
  br label %for.inc

for.inc:                                          ; preds = %for.body
  %12 = load i32* %i, align 4
  %13 = load i32* %i, align 4, !mxk1 !2, !mxk !3, !numuses !4
  call void @doProfiling(i32 27)
  %inc = add nsw i32 %12, 1
  %inc6 = add nsw i32 %13, 1, !mxk1 !5, !mxk !11, !numuses !4
  call void @doProfiling(i32 8)
  %eddi_check7 = icmp ne i32 %inc, %inc6
  call void @check_and_error_handling(i1 %eddi_check7)
  store i32 %inc, i32* %i, align 4, !storemark !1
  br label %for.cond

for.end:                                          ; preds = %for.cond
  %14 = load i32* %fact, align 4
  %15 = load i32* %fact, align 4, !mxk1 !2, !mxk !3, !numuses !4
  call void @doProfiling(i32 27)
  %eddi_check8 = icmp ne i32 %14, %15
  call void @check_and_error_handling(i1 %eddi_check8)
  %call1 = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([4 x i8]* @.str, i32 0, i32 0), i32 %14)
  call void @doProfiling(i32 49)
  call void @endProfiling()
  call void @exit(i32 0) #3
  unreachable

return:                                           ; No predecessors!
  %16 = load i32* %retval
  %17 = load i32* %retval, !mxk1 !2, !mxk !3, !numuses !12
  call void @doProfiling(i32 27)
  call void @endProfiling()
  %eddi_check9 = icmp ne i32 %16, %17
  call void @check_and_error_handling(i1 %eddi_check9)
  ret i32 %16

mxk_error_detection:                              ; No predecessors!
  call void @error_handling()
  %eddi_check10 = icmp ne i32 %16, %17
  call void @check_and_error_handling(i1 %eddi_check10)
  ret i32 %16
}

declare i32 @atoi(...) #1

declare i32 @printf(i8*, ...) #1

; Function Attrs: noreturn
declare void @exit(i32) #2

declare void @doProfiling(i32)

declare void @endProfiling()

declare void @error_handling()

declare void @check_and_error_handling(i1)

attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #2 = { noreturn "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #3 = { noreturn }

!llvm.ident = !{!0}

!0 = metadata !{metadata !"clang version 3.4 (tags/RELEASE_34/final)"}
!1 = metadata !{null, metadata !"store"}
!2 = metadata !{null}
!3 = metadata !{metadata !"load"}
!4 = metadata !{i64 1}
!5 = metadata !{null, metadata !""}
!6 = metadata !{metadata !"getelementptr"}
!7 = metadata !{null, metadata !"arrayidx1"}
!8 = metadata !{null, metadata !"", metadata !""}
!9 = metadata !{metadata !"icmp"}
!10 = metadata !{metadata !"mul"}
!11 = metadata !{metadata !"add"}
!12 = metadata !{i64 2}

但经过后端编译”llc factorial_swift.ll -o factorial_swift.s”后，得到的汇编代码为：

	.file	"a-profiling.ll"
	.text
	.globl	main
	.align	16, 0x90
	.type	main,@function
main:                                   # @main
	.cfi_startproc
# BB#0:                                 # %entry
	pushq	%rbp
.Ltmp3:
	.cfi_def_cfa_offset 16
.Ltmp4:
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
.Ltmp5:
	.cfi_def_cfa_register %rbp
	pushq	%r14
	pushq	%rbx
	subq	$32, %rsp
.Ltmp6:
	.cfi_offset %rbx, -32
.Ltmp7:
	.cfi_offset %r14, -24
	movq	%rsi, %rbx
	movl	%edi, %r14d
	movl	$26, %edi
	callq	doProfiling
	movl	$26, %edi
	callq	doProfiling
	movl	$26, %edi
	callq	doProfiling
	movl	$26, %edi
	callq	doProfiling
	movl	$26, %edi
	callq	doProfiling
	movl	$26, %edi
	callq	doProfiling
	movl	$0, -20(%rbp)
	movl	%r14d, -24(%rbp)
	movq	%rbx, -32(%rbp)
	movl	$27, %edi
	callq	doProfiling
	movl	$29, %edi
	callq	doProfiling
	movq	8(%rbx), %rbx
	movl	$27, %edi
	callq	doProfiling
	xorl	%edi, %edi
	callq	check_and_error_handling
	xorl	%eax, %eax
	movq	%rbx, %rdi
	callq	atoi
	movl	%eax, %ebx
	movl	$49, %edi
	callq	doProfiling
	movl	%ebx, -44(%rbp)
	movl	$1, -40(%rbp)
	movl	$1, -36(%rbp)
	jmp	.LBB0_1
	.align	16, 0x90
.LBB0_2:                                # %for.body
                                        #   in Loop: Header=BB0_1 Depth=1
	movl	-40(%rbp), %ebx
	movl	$27, %edi
	callq	doProfiling
	imull	-36(%rbp), %ebx
	movl	$27, %edi
	callq	doProfiling
	movl	$12, %edi
	callq	doProfiling
	xorl	%edi, %edi
	callq	check_and_error_handling
	movl	%ebx, -40(%rbp)
	movl	-36(%rbp), %ebx
	movl	$27, %edi
	callq	doProfiling
	incl	%ebx
	movl	$8, %edi
	callq	doProfiling
	xorl	%edi, %edi
	callq	check_and_error_handling
	movl	%ebx, -36(%rbp)
.LBB0_1:                                # %for.cond
                                        # =>This Inner Loop Header: Depth=1
	movl	-36(%rbp), %r14d
	movl	$27, %edi
	callq	doProfiling
	movl	-44(%rbp), %ebx
	movl	$27, %edi
	callq	doProfiling
	movl	$46, %edi
	callq	doProfiling
	xorl	%edi, %edi
	callq	check_and_error_handling
	cmpl	%ebx, %r14d
	jle	.LBB0_2
# BB#3:                                 # %for.end
	movl	-40(%rbp), %ebx
	movl	$27, %edi
	callq	doProfiling
	xorl	%edi, %edi
	callq	check_and_error_handling
	movl	$.L.str, %edi
	xorl	%eax, %eax
	movl	%ebx, %esi
	callq	printf
	movl	$49, %edi
	callq	doProfiling
	callq	endProfiling
	xorl	%edi, %edi
	callq	exit
.Ltmp8:
	.size	main, .Ltmp8-main
	.cfi_endproc

	.type	.L.str,@object          # @.str
	.section	.rodata.str1.1,"aMS",@progbits,1
.L.str:
	.asciz	"%d\n"
	.size	.L.str, 4


	.ident	"clang version 3.4 (tags/RELEASE_34/final)"
	.section	".note.GNU-stack","",@progbits

其中所有的检测指令，全部变成了毫无意义的比对：

1 2	xorl %edi, %edi callq check_and_error_handling

对上面的代码进行故障注入实验，可以看到，一条错误都检测不出来。

=========================================================
Timeout Count: 157
Exception Count: 743
SDC Count: 481
Right Count: 1107
Detected Count: 0
=========================================================
time elapsed: 1.439475
=========================================================
Real Timeout Count: 770
Real Exception Count: 2026
Real SDC Count: 2500
Real Right Count: 7256
Real Detected Count: 0.0
=========================================================
Total Interval Number: 12552
(2488, 8)

该问题普遍存在于软加固领域，以往有研究试图通过差异性变换来减轻后端编译带来的问题，在LLVM中，默认的编译优化会在寄存器分配时会做出如上的改造。不开启编译优化，采用-O0编译选项能够部分解决该问题，但对代码性能会造成较为严重的影响。为解决该问题，需要对后端编译过程，主要是寄存器分配过程进行改造。

采用-O0编译选项编译，经过故障注入后的结果为，可以看到，软加固的效果同样十分不理想：

Timeout Count: 91
Exception Count: 326
SDC Count: 496
Right Count: 877
Detected Count: 706
=========================================================
time elapsed: 1.225579
=========================================================
Real Timeout Count: 418
Real Exception Count: 1672
Real SDC Count: 1938
Real Right Count: 5609
Real Detected Count: 1515
=========================================================
Total Interval Number: 11152
(2496, 8)

对于未加固的程序，采用-O0优化（就是无优化）时，动态指令序列为83，故障注入结果为：


=========================================================
Timeout Count: 10
Exception Count: 199
SDC Count: 411
Right Count: 236
Detected Count: 0
=========================================================
time elapsed: 0.388901
=========================================================
Real Timeout Count: 20
Real Exception Count: 703
Real SDC Count: 652
Real Right Count: 1385
Real Detected Count: 0.0
=========================================================
Total Interval Number: 2760
(856, 8)

=========================================================
Timeout Count: 19
Exception Count: 208
SDC Count: 385
Right Count: 244
Detected Count: 0
=========================================================
time elapsed: 0.455671
=========================================================
Real Timeout Count: 36
Real Exception Count: 755
Real SDC Count: 562
Real Right Count: 1407
Real Detected Count: 0.0
=========================================================
Total Interval Number: 2760
(856, 8)

比对汇编代码发现，即便开启-O0，仍然有软加固代码被优化掉的现象：

未加固版的汇编代码：

.LBB0_1:                                # %for.cond
                                        # =>This Inner Loop Header: Depth=1
	movl	-20(%rbp), %eax
	cmpl	-28(%rbp), %eax
	jg	.LBB0_4

加固版汇编代码：

.LBB0_1:                                # %for.cond
                                        # =>This Inner Loop Header: Depth=1
	movl	-20(%rbp), %eax
	movl	-28(%rbp), %ecx
	subl	%ecx, %eax
	setle	%dl
	xorl	%ecx, %ecx
	movb	%cl, %sil
	movzbl	%sil, %edi
	movl	%eax, -44(%rbp)         # 4-byte Spill
	movb	%dl, -45(%rbp)          # 1-byte Spill
	callq	check_and_error_handling
	movb	-45(%rbp), %dl          # 1-byte Reload
	testb	$1, %dl
	jne	.LBB0_2
	jmp	.LBB0_4

为此，进行了进一步的尝试，将比对

=========================================================
Timeout Count: 110
Exception Count: 304
SDC Count: 483
Right Count: 673
Detected Count: 998
=========================================================
time elapsed: 7.668829
=========================================================
Real Timeout Count: 344
Real Exception Count: 1504
Real SDC Count: 1793
Real Right Count: 4744
Real Detected Count: 2447
=========================================================
Total Interval Number: 10832
(2568, 8)

Todo：通过改变寄存器分配方式，减轻后端编译优化对软加固代码的影响。

疑问，是不是通过传参比对就可以避免二者之间的依赖关系了？或者有没有可能解决了前文的基本块拆分问题，就可以阻止此处的寄存器分配导致软加固失效的现象了？

实验一：将比较指令删除，将原始变量和副本变量的一致性校验功能集成到函数中，通过函数调用避免编译器发现二者之间的关联性，从而为原始变量和副本变量分配相同的寄存器，致使软加固失效。

实验结果总结：即便

factorial_swift.ll

; ModuleID = '/home/xiaofengwo/llvm/llvm-workspace/sample_programs/factorial/llfi/a-llfi_index.ll'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@.str = private unnamed_addr constant [4 x i8] c"%d\0A\00", align 1

; Function Attrs: nounwind uwtable
define i32 @main(i32 %argc, i8** %argv) #0 {
entry:
  %retval = alloca i32, align 4
  call void @doProfiling(i32 26)
  %argc.addr = alloca i32, align 4
  call void @doProfiling(i32 26)
  %argv.addr = alloca i8**, align 8
  call void @doProfiling(i32 26)
  %i = alloca i32, align 4
  call void @doProfiling(i32 26)
  %fact = alloca i32, align 4
  call void @doProfiling(i32 26)
  %n = alloca i32, align 4
  call void @doProfiling(i32 26)
  store i32 0, i32* %retval, !storemark !1
  store i32 %argc, i32* %argc.addr, align 4, !storemark !1
  store i8** %argv, i8*** %argv.addr, align 8, !storemark !1
  %0 = load i8*** %argv.addr, align 8
  %1 = load i8*** %argv.addr, align 8, !mxk1 !2, !mxk !3, !numuses !4
  call void @doProfiling(i32 27)
  %arrayidx = getelementptr inbounds i8** %0, i64 1
  %arrayidx1 = getelementptr inbounds i8** %1, i64 1, !mxk1 !5, !mxk !6, !numuses !4
  call void @doProfiling(i32 29)
  %2 = load i8** %arrayidx, align 8
  %3 = load i8** %arrayidx1, align 8, !mxk1 !7, !mxk !3, !numuses !4
  call void @doProfiling(i32 27)
  call void @check_and_error_handling2(i8* %2, i8* %3)
  %call = call i32 (i8*, ...)* bitcast (i32 (...)* @atoi to i32 (i8*, ...)*)(i8* %2)
  call void @doProfiling(i32 49)
  store i32 %call, i32* %n, align 4, !storemark !1
  store i32 1, i32* %fact, align 4, !storemark !1
  store i32 1, i32* %i, align 4, !storemark !1
  br label %for.cond

for.cond:                                         ; preds = %for.inc, %entry
  %4 = load i32* %i, align 4
  %5 = load i32* %i, align 4, !mxk1 !2, !mxk !3, !numuses !4
  call void @doProfiling(i32 27)
  %6 = load i32* %n, align 4
  %7 = load i32* %n, align 4, !mxk1 !2, !mxk !3, !numuses !4
  call void @doProfiling(i32 27)
  %cmp = icmp sle i32 %4, %6
  %cmp2 = icmp sle i32 %5, %7, !mxk1 !8, !mxk !9, !numuses !4
  call void @doProfiling(i32 46)
  call void bitcast (void (i8*, i8*)* @check_and_error_handling2 to void (i1, i1)*)(i1 %cmp, i1 %cmp2)
  br i1 %cmp, label %for.body, label %for.end

for.body:                                         ; preds = %for.cond
  %8 = load i32* %fact, align 4
  %9 = load i32* %fact, align 4, !mxk1 !2, !mxk !3, !numuses !4
  call void @doProfiling(i32 27)
  %10 = load i32* %i, align 4
  %11 = load i32* %i, align 4, !mxk1 !2, !mxk !3, !numuses !4
  call void @doProfiling(i32 27)
  %mul = mul nsw i32 %8, %10
  %mul3 = mul nsw i32 %9, %11, !mxk1 !8, !mxk !10, !numuses !4
  call void @doProfiling(i32 12)
  call void bitcast (void (i8*, i8*)* @check_and_error_handling2 to void (i32, i32)*)(i32 %mul, i32 %mul3)
  store i32 %mul, i32* %fact, align 4, !storemark !1
  br label %for.inc

for.inc:                                          ; preds = %for.body
  %12 = load i32* %i, align 4
  %13 = load i32* %i, align 4, !mxk1 !2, !mxk !3, !numuses !4
  call void @doProfiling(i32 27)
  %inc = add nsw i32 %12, 1
  %inc4 = add nsw i32 %13, 1, !mxk1 !5, !mxk !11, !numuses !4
  call void @doProfiling(i32 8)
  call void bitcast (void (i8*, i8*)* @check_and_error_handling2 to void (i32, i32)*)(i32 %inc, i32 %inc4)
  store i32 %inc, i32* %i, align 4, !storemark !1
  br label %for.cond

for.end:                                          ; preds = %for.cond
  %14 = load i32* %fact, align 4
  %15 = load i32* %fact, align 4, !mxk1 !2, !mxk !3, !numuses !4
  call void @doProfiling(i32 27)
  call void bitcast (void (i8*, i8*)* @check_and_error_handling2 to void (i32, i32)*)(i32 %14, i32 %15)
  %call1 = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([4 x i8]* @.str, i32 0, i32 0), i32 %14)
  call void @doProfiling(i32 49)
  call void @endProfiling()
  call void @exit(i32 0) #3
  unreachable

return:                                           ; No predecessors!
  %16 = load i32* %retval
  %17 = load i32* %retval, !mxk1 !2, !mxk !3, !numuses !12
  call void @doProfiling(i32 27)
  call void @endProfiling()
  call void bitcast (void (i8*, i8*)* @check_and_error_handling2 to void (i32, i32)*)(i32 %16, i32 %17)
  ret i32 %16

mxk_error_detection:                              ; No predecessors!
  call void @error_handling()
  call void bitcast (void (i8*, i8*)* @check_and_error_handling2 to void (i32, i32)*)(i32 %16, i32 %17)
  ret i32 %16
}

declare i32 @atoi(...) #1

declare i32 @printf(i8*, ...) #1

; Function Attrs: noreturn
declare void @exit(i32) #2

declare void @doProfiling(i32)

declare void @endProfiling()

declare void @error_handling()

declare void @check_and_error_handling2(i8*, i8*)

attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #2 = { noreturn "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #3 = { noreturn }

!llvm.ident = !{!0}

!0 = metadata !{metadata !"clang version 3.4 (tags/RELEASE_34/final)"}
!1 = metadata !{null, metadata !"store"}
!2 = metadata !{null}
!3 = metadata !{metadata !"load"}
!4 = metadata !{i64 1}
!5 = metadata !{null, metadata !""}
!6 = metadata !{metadata !"getelementptr"}
!7 = metadata !{null, metadata !"arrayidx1"}
!8 = metadata !{null, metadata !"", metadata !""}
!9 = metadata !{metadata !"icmp"}
!10 = metadata !{metadata !"mul"}
!11 = metadata !{metadata !"add"}
!12 = metadata !{i64 2}

factorial_swift.s

	.file	"a-profiling.ll"
	.text
	.globl	main
	.align	16, 0x90
	.type	main,@function
main:                                   # @main
	.cfi_startproc
# BB#0:                                 # %entry
	pushq	%rbp
.Ltmp3:
	.cfi_def_cfa_offset 16
.Ltmp4:
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
.Ltmp5:
	.cfi_def_cfa_register %rbp
	pushq	%r15
	pushq	%r14
	pushq	%rbx
	subq	$40, %rsp
.Ltmp6:
	.cfi_offset %rbx, -40
.Ltmp7:
	.cfi_offset %r14, -32
.Ltmp8:
	.cfi_offset %r15, -24
	movq	%rsi, %rbx
	movl	%edi, %r14d
	movl	$26, %edi
	callq	doProfiling
	movl	$26, %edi
	callq	doProfiling
	movl	$26, %edi
	callq	doProfiling
	movl	$26, %edi
	callq	doProfiling
	movl	$26, %edi
	callq	doProfiling
	movl	$26, %edi
	callq	doProfiling
	movl	$0, -28(%rbp)
	movl	%r14d, -32(%rbp)
	movq	%rbx, -40(%rbp)
	movl	$27, %edi
	callq	doProfiling
	movl	$29, %edi
	callq	doProfiling
	movq	8(%rbx), %rbx
	movl	$27, %edi
	callq	doProfiling
	movq	%rbx, %rdi
	movq	%rbx, %rsi
	callq	check_and_error_handling2
	xorl	%eax, %eax
	movq	%rbx, %rdi
	callq	atoi
	movl	%eax, %ebx
	movl	$49, %edi
	callq	doProfiling
	movl	%ebx, -52(%rbp)
	movl	$1, -48(%rbp)
	movl	$1, -44(%rbp)
	jmp	.LBB0_1
	.align	16, 0x90
.LBB0_2:                                # %for.body
                                        #   in Loop: Header=BB0_1 Depth=1
	movl	-48(%rbp), %ebx
	movl	$27, %edi
	callq	doProfiling
	imull	-44(%rbp), %ebx
	movl	$27, %edi
	callq	doProfiling
	movl	$12, %edi
	callq	doProfiling
	movl	%ebx, %edi
	movl	%ebx, %esi
	callq	check_and_error_handling2
	movl	%ebx, -48(%rbp)
	movl	-44(%rbp), %ebx
	movl	$27, %edi
	callq	doProfiling
	incl	%ebx
	movl	$8, %edi
	callq	doProfiling
	movl	%ebx, %edi
	movl	%ebx, %esi
	callq	check_and_error_handling2
	movl	%ebx, -44(%rbp)
.LBB0_1:                                # %for.cond
                                        # =>This Inner Loop Header: Depth=1
	movl	-44(%rbp), %r14d
	movl	$27, %edi
	callq	doProfiling
	movl	-52(%rbp), %ebx
	movl	$27, %edi
	callq	doProfiling
	cmpl	%ebx, %r14d
	setle	%r15b
	movl	$46, %edi
	callq	doProfiling
	movzbl	%r15b, %edi
	movl	%edi, %esi
	callq	check_and_error_handling2
	cmpl	%ebx, %r14d
	jle	.LBB0_2
# BB#3:                                 # %for.end
	movl	-48(%rbp), %ebx
	movl	$27, %edi
	callq	doProfiling
	movl	%ebx, %edi
	movl	%ebx, %esi
	callq	check_and_error_handling2
	movl	$.L.str, %edi
	xorl	%eax, %eax
	movl	%ebx, %esi
	callq	printf
	movl	$49, %edi
	callq	doProfiling
	callq	endProfiling
	xorl	%edi, %edi
	callq	exit
.Ltmp9:
	.size	main, .Ltmp9-main
	.cfi_endproc

	.type	.L.str,@object          # @.str
	.section	.rodata.str1.1,"aMS",@progbits,1
.L.str:
	.asciz	"%d\n"
	.size	.L.str, 4


	.ident	"clang version 3.4 (tags/RELEASE_34/final)"
	.section	".note.GNU-stack","",@progbits

可以看出，并没有什么卵用。

偶然一试，发现修改llc编译优化选项似乎有些效果。

	.file	"a-profiling.ll"
	.text
	.globl	main
	.align	16, 0x90
	.type	main,@function
main:                                   # @main
	.cfi_startproc
# BB#0:                                 # %entry
	pushq	%rbp
.Ltmp2:
	.cfi_def_cfa_offset 16
.Ltmp3:
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
.Ltmp4:
	.cfi_def_cfa_register %rbp
	subq	$176, %rsp
	movl	$26, %eax
	movl	%edi, -32(%rbp)         # 4-byte Spill
	movl	%eax, %edi
	movq	%rsi, -40(%rbp)         # 8-byte Spill
	callq	doProfiling
	movl	$26, %edi
	callq	doProfiling
	movl	$26, %edi
	callq	doProfiling
	movl	$26, %edi
	callq	doProfiling
	movl	$26, %edi
	callq	doProfiling
	movl	$26, %edi
	callq	doProfiling
	movl	$27, %edi
	movl	$0, -4(%rbp)
	movl	-32(%rbp), %eax         # 4-byte Reload
	movl	%eax, -8(%rbp)
	movq	-40(%rbp), %rsi         # 8-byte Reload
	movq	%rsi, -16(%rbp)
	movq	-16(%rbp), %rcx
	movq	-16(%rbp), %rdx
	movq	%rdx, -48(%rbp)         # 8-byte Spill
	movq	%rcx, -56(%rbp)         # 8-byte Spill
	callq	doProfiling
	movl	$29, %edi
	callq	doProfiling
	movl	$27, %edi
	movq	-56(%rbp), %rcx         # 8-byte Reload
	movq	8(%rcx), %rdx
	movq	-48(%rbp), %rsi         # 8-byte Reload
	movq	8(%rsi), %rsi
	movq	%rsi, -64(%rbp)         # 8-byte Spill
	movq	%rdx, -72(%rbp)         # 8-byte Spill
	callq	doProfiling
	movq	-72(%rbp), %rdi         # 8-byte Reload
	movq	-64(%rbp), %rsi         # 8-byte Reload
	callq	check_and_error_handling2
	movl	$49, %edi
	movq	-72(%rbp), %rcx         # 8-byte Reload
	movl	%edi, -76(%rbp)         # 4-byte Spill
	movq	%rcx, %rdi
	movb	$0, %al
	callq	atoi
	movl	-76(%rbp), %edi         # 4-byte Reload
	movl	%eax, -80(%rbp)         # 4-byte Spill
	callq	doProfiling
	movl	-80(%rbp), %eax         # 4-byte Reload
	movl	%eax, -28(%rbp)
	movl	$1, -24(%rbp)
	movl	$1, -20(%rbp)
.LBB0_1:                                # %for.cond
                                        # =>This Inner Loop Header: Depth=1
	movl	$27, %edi
	movl	-20(%rbp), %eax
	movl	-20(%rbp), %ecx
	movl	%ecx, -84(%rbp)         # 4-byte Spill
	movl	%eax, -88(%rbp)         # 4-byte Spill
	callq	doProfiling
	movl	$27, %edi
	movl	-28(%rbp), %eax
	movl	-28(%rbp), %ecx
	movl	%ecx, -92(%rbp)         # 4-byte Spill
	movl	%eax, -96(%rbp)         # 4-byte Spill
	callq	doProfiling
	movl	$46, %edi
	movl	-88(%rbp), %eax         # 4-byte Reload
	movl	-96(%rbp), %ecx         # 4-byte Reload
	cmpl	%ecx, %eax
	setle	%dl
	movl	-84(%rbp), %esi         # 4-byte Reload
	movl	-92(%rbp), %r8d         # 4-byte Reload
	cmpl	%r8d, %esi
	setle	%r9b
	movb	%r9b, -97(%rbp)         # 1-byte Spill
	movb	%dl, -98(%rbp)          # 1-byte Spill
	callq	doProfiling
	movb	-98(%rbp), %dl          # 1-byte Reload
	movzbl	%dl, %edi
	movb	-97(%rbp), %r9b         # 1-byte Reload
	movzbl	%r9b, %esi
	callq	check_and_error_handling2
	movb	-98(%rbp), %dl          # 1-byte Reload
	testb	$1, %dl
	jne	.LBB0_2
	jmp	.LBB0_4
.LBB0_2:                                # %for.body
                                        #   in Loop: Header=BB0_1 Depth=1
	movl	$27, %edi
	movl	-24(%rbp), %eax
	movl	-24(%rbp), %ecx
	movl	%ecx, -104(%rbp)        # 4-byte Spill
	movl	%eax, -108(%rbp)        # 4-byte Spill
	callq	doProfiling
	movl	$27, %edi
	movl	-20(%rbp), %eax
	movl	-20(%rbp), %ecx
	movl	%ecx, -112(%rbp)        # 4-byte Spill
	movl	%eax, -116(%rbp)        # 4-byte Spill
	callq	doProfiling
	movl	$12, %edi
	movl	-108(%rbp), %eax        # 4-byte Reload
	movl	-116(%rbp), %ecx        # 4-byte Reload
	imull	%ecx, %eax
	movl	-104(%rbp), %ecx        # 4-byte Reload
	movl	-112(%rbp), %edx        # 4-byte Reload
	imull	%edx, %ecx
	movl	%ecx, -120(%rbp)        # 4-byte Spill
	movl	%eax, -124(%rbp)        # 4-byte Spill
	callq	doProfiling
	movl	-124(%rbp), %edi        # 4-byte Reload
	movl	-120(%rbp), %esi        # 4-byte Reload
	callq	check_and_error_handling2
	movl	-124(%rbp), %eax        # 4-byte Reload
	movl	%eax, -24(%rbp)
# BB#3:                                 # %for.inc
                                        #   in Loop: Header=BB0_1 Depth=1
	movl	$27, %edi
	movl	-20(%rbp), %eax
	movl	-20(%rbp), %ecx
	movl	%ecx, -128(%rbp)        # 4-byte Spill
	movl	%eax, -132(%rbp)        # 4-byte Spill
	callq	doProfiling
	movl	$8, %edi
	movl	-132(%rbp), %eax        # 4-byte Reload
	addl	$1, %eax
	movl	-128(%rbp), %ecx        # 4-byte Reload
	addl	$1, %ecx
	movl	%ecx, -136(%rbp)        # 4-byte Spill
	movl	%eax, -140(%rbp)        # 4-byte Spill
	callq	doProfiling
	movl	-140(%rbp), %edi        # 4-byte Reload
	movl	-136(%rbp), %esi        # 4-byte Reload
	callq	check_and_error_handling2
	movl	-140(%rbp), %eax        # 4-byte Reload
	movl	%eax, -20(%rbp)
	jmp	.LBB0_1
.LBB0_4:                                # %for.end
	movl	$27, %edi
	movl	-24(%rbp), %eax
	movl	-24(%rbp), %esi
	movl	%esi, -144(%rbp)        # 4-byte Spill
	movl	%eax, -148(%rbp)        # 4-byte Spill
	callq	doProfiling
	leaq	.L.str, %rdi
	movl	-148(%rbp), %eax        # 4-byte Reload
	movq	%rdi, -160(%rbp)        # 8-byte Spill
	movl	%eax, %edi
	movl	-144(%rbp), %esi        # 4-byte Reload
	callq	check_and_error_handling2
	movq	-160(%rbp), %rdi        # 8-byte Reload
	movl	-148(%rbp), %esi        # 4-byte Reload
	movb	$0, %al
	callq	printf
	movl	$49, %edi
	movl	%eax, -164(%rbp)        # 4-byte Spill
	callq	doProfiling
	callq	endProfiling
	movl	$0, %edi
	callq	exit
.Ltmp5:
	.size	main, .Ltmp5-main
	.cfi_endproc

	.type	.L.str,@object          # @.str
	.section	.rodata.str1.1,"aMS",@progbits,1
.L.str:
	.asciz	"%d\n"
	.size	.L.str, 4


	.ident	"clang version 3.4 (tags/RELEASE_34/final)"
	.section	".note.GNU-stack","",@progbits