-
Notifications
You must be signed in to change notification settings - Fork 152
Writing an IFDS Analysis
In order to solve a data-flow analysis using IFDS you have to create an instance of the IFDSTabulationProblem interface.
For that you create a new class that inherits from IFDSTabulationProblemand implements all its pure-virtual functions.
A guide on how to implement these functions you can find below.
You can find example code here.
First, though, we have to supply an AnalysisDomain as type parameter to the IFDSTabulationProblem template.
The AnalysisDomain is a struct that aggregates a number of type aliases that are used by the interface, including the instruction-type n_t and the data-flow fact type d_t.
In most cases, it is a good idea to use the pre-defined LLVMIFDSAnalysisDomainDefault that defines the type-aliases based on the most common uses on LLVM IR (n_t = const llvm::Instruction * and d_t = const llvm::Value *.
If you need to customize some of these types, just create a new struct inheriting from LLVMIFDSAnalysisDomainDefault that shadows the customized types.
So, for example to customize d_t:
struct MyAnalysisDomain : psr::LLVMIFDSAnalysisDomainDefault {
using d_t = MyDataflowFact;
};
class MyIFDSAnalysis : public psr::IFDSTabulationProblem<MyAnalysisDomain> {
...
};In IFDS, data-flow facts are propagated through the program in a point-wise manner.
That is, the IFDSSolver repeatedly invokes the flow-functions per instruction for each incoming data-flow fact in isolation.
Thus, a FlowFunction in PhASAR has the signature d_t -> set<d_t>, computing the data-flow facts that should hold after the current statement if the analysis has already inferred that the incoming "source" data-flow fact holds before the current statement.
Unconditionally generating data-flow facts is done by artificially generating them from the special zero fact.
For the IFDSSolver to know, what flow function should be invoked at a certain instruction, we have to override the so-called flow-function factories of the IFDSTabulationProblem.
For each reached instruction exactly one flow-function factory is called exactly once (except for the summary flow-function factory -- see below).
The resulting flow function is cached within the solver.
There are the following flow-function factories:
-
getNormalFlowFunction:- Intra-procedural data flow. Gets invoked for each instruction that is neither a function-call nor a return/resume.
- Use the flow-function templates from
FlowFunctions.hto simplify the process. If none of the pre-defined helper functions fits your needs, you may want to uselambdaFlowand provide a C++-lambda expression as flow function.
-
getCallToRetFlowFunction:- Intra-procedural data flow at function calls. Gets invoked for each function call instruction and models all data flows that are not affected by the call.
- Consider using the helper function
mapFactsAlongsideCallSite
-
getCallFlowFunction:- Inter-procedural data flow at function calls. Gets invoked for each callsite-callee pair and is responsible for mapping arguments at the callsite to the corresponding parameters within the callee. Additionally, globals are mapped to the callee context.
- Consider using the helper function
mapFactsToCallee
-
getRetFlowFunction:- Inter-procedural data flow at function returns. Gets invoked for each
retandresumeinstruction and is responsible for mapping the return-value to the callsite and optionally mapping back parameters to their corresponding arguments at the callsite. - Consider using the helper function
mapFactsToCaller
- Inter-procedural data flow at function returns. Gets invoked for each
-
getSummaryFlowFunction[optional]:- For special functions where you want to customize their effects on the analysis. Prevents the function's body being analyzed by the IFDS analysis.
- If this function returns non-nullptr, the
getCallFlowFunctionfunction will not be called. - Useful for modeling the effects of compiler-intrinsics or standard-library functions
- Defaults to unconditionally returning
nullptr
In addition to the flow-function factories, you have to implement a number of miscellaneous functions:
-
ctor: For being able to create an instance of your IFDS analysis class, you (obviously) have to provide a constructor. As the base-classIFDSTabulationProblemdoes not provide a default-constructor, we have to explicitly initialize the base as well. In fact you have to provide the following arguments to the base:-
IRDB: An instance ofLLVMProjectIRDBthat owns the LLVM IR to analyze -
EntryPoints: A vector of (mangled) function names that identify the functions within theIRDBwhere the analysis should start -
ZeroValue[optional]: A dummy data-flow fact that is assumed to hold at each instruction unconditionally. UsuallyLLVMZeroValue::getInstance().
Important: If not provided as ctor argument, you have to explicitly initialize the inheritedZeroValuedata-member to a non-nullopt value in your ctor.
-
-
psr::NToString(n_t)[optional]: Customizes, how an instruction (n_t) should be printed, for LLVM values making use ofllvmIRToString. For custom types defaults to.str(),to_string(...)oroperator<<in decreasing priority. -
psr::DToString(d_t)[optional]: Customizes, how a data-flow fact (d_t) should be printed, for LLVM values making use ofllvmIRToStringorllvmIRToShortString. For custom types defaults to.str(),to_string(...)oroperator<<in decreasing priority. -
psr::FToString(d_t)[optional]: Customizes, how a function (d_t) should be printed, for LLVM functions only printing the name. For custom types defaults to.str(),to_string(...)oroperator<<in decreasing priority. -
isZeroValue[optional]: Checks whether the given data-flow fact is the special zero value. Defaults to==comparison withgetZeroValue(). -
initialSeeds: Creates a map from instruction (n_t) to data-flow fact (d_t) that defines, what facts should hold unconditionally at the points where the analysis should start. Should be based on theEntryPointsctor parameter. You may want to use the utilitiesaddSeedsForStartingPointsorforallStartingPointsfor this. -
emitTextReport[optional]: Customizes, how the analysis results should be printed to the terminal.
The flow-function factories use the custom type FlowFunctionPtrType as a return type.
This type hides the memory management mechanism used from the actual analysis interface.
Currently, FlowFunctionPtrType maps to std::shared_ptr<FlowFunction<D>>, but this may change in the future.
- Home
- Building PhASAR
- Getting Started:
- Using PhASAR with Docker
- FAQ
- Tutorials
- Contributing
- Reference Material
- Update to Newer LLVM Versions