IR2Vec
|
The following guide details the steps followed in upgrading the LLVM version supported by IR2Vec
IR2Vec-Version-Upgrade-Checks
has the required scripts to be run for this process.collect_ir
.collect_ir/spec/get_ll_files_list.py
collect_ir/boost/get_ll_files_list.py
collect_ir/spec/get_ll_spec.sh
.c*
files with the relevant LLVM version.get_boost_ll.py
for this purpose.Collect the paths to all these compiled .ll / .o files in a single place using the scripts collect_ir/spec/get_ll_files_list.py
and collect_ir/boost/get_ll_files_list.py
.
.ll
file paths, we go to the seed_embedding folder in the main IR2Vec repository. Here, our process will have to involve the following tasks.triplets.sh
bash file with relevant changes to update the llvm version. Instructions to run the same are available at seed_embeddings/README.md
.openKE
folder, at IR2Vec/seed_embeddings/OpenKE/preprocess.py
preprocessed
folder. Inside this folder, we should have the relevant preprocessed data generated.embeddings
folder here. This will be relevant for the next step.We have recently retrained the IR2Vec embeddings with a larger dataset. This is the ComPile dataset. This dataset is a collection of LLVM IR files from open-source projects, and is considerably larger than the current dataset used in the original IR2Vec paper. Further details about the retraining process can be found here.
Once the trainIDs, relations and entities files are generated, we can use them, as it is, in the training process as described before.
generate_embeddings_ray.py
file in the openKE
folder.openKE.yaml
file to create the conda environment.Ray
and tensorflow
installed._ray.py
file with relevant training hyperparameters.python3 generate_embeddings_ray.py
. This will run the training, generate the best embedding file and record the results.oracle
and get the test_suite
working.seedEmbedding.txt
and move it to the vocabulary
folder. Remove any prior files present there.src
folder. This folder contains the test-suite
folder as well.generate_llfiles.sh
and generateOracle.sh
. Run both of these files with the appropriate version of llvm
.CMakeLists.txt
in the test-suite
folder with the appropriate changes.sanity_check.sh.cmake
file with the appropriate paths for vocabulary, llvm version, etc.build
folder. Regenerate the contents using the CMAKE call from the build processmake check
. This should compile successfully.git grep ..
helps.llvm16
to llvm17
, we can run the following commands to spot any required version changes in the code.git grep 16
git grep llvm16
test.yml
in github workflows.pre-commit
in a fresh conda environment.pre-commit
install, followed by pre-commit run --all-files
.Once all this is done, you should be able to push the commits without any test failures.