|
IR2Vec
|
The following guide details the steps followed in upgrading the LLVM version supported by IR2Vec
IR2Vec-Version-Upgrade-Checks has the required scripts to be run for this process.collect_ir.collect_ir/spec/get_ll_files_list.pycollect_ir/boost/get_ll_files_list.pycollect_ir/spec/get_ll_spec.sh.c* files with the relevant LLVM version.get_boost_ll.py for this purpose.Collect the paths to all these compiled .ll / .o files in a single place using the scripts collect_ir/spec/get_ll_files_list.py and collect_ir/boost/get_ll_files_list.py.
.ll file paths, we go to the seed_embedding folder in the main IR2Vec repository. Here, our process will have to involve the following tasks.triplets.sh bash file with relevant changes to update the llvm version. Instructions to run the same are available at seed_embeddings/README.md.openKE folder, at IR2Vec/seed_embeddings/OpenKE/preprocess.pypreprocessed folder. Inside this folder, we should have the relevant preprocessed data generated.embeddings folder here. This will be relevant for the next step.We have recently retrained the IR2Vec embeddings with a larger dataset. This is the ComPile dataset. This dataset is a collection of LLVM IR files from open-source projects, and is considerably larger than the current dataset used in the original IR2Vec paper. Further details about the retraining process can be found here.
Once the trainIDs, relations and entities files are generated, we can use them, as it is, in the training process as described before.
generate_embeddings_ray.py file in the openKE folder.openKE.yaml file to create the conda environment.Ray and tensorflow installed._ray.py file with relevant training hyperparameters.python3 generate_embeddings_ray.py. This will run the training, generate the best embedding file and record the results.oracle and get the test_suite working.seedEmbedding.txt and move it to the vocabulary folder. Remove any prior files present there.src folder. This folder contains the test-suite folder as well.generate_llfiles.sh and generateOracle.sh. Run both of these files with the appropriate version of llvm.CMakeLists.txt in the test-suite folder with the appropriate changes.sanity_check.sh.cmake file with the appropriate paths for vocabulary, llvm version, etc.build folder. Regenerate the contents using the CMAKE call from the build processmake check. This should compile successfully.git grep .. helps.llvm16 to llvm17, we can run the following commands to spot any required version changes in the code.git grep 16git grep llvm16test.yml in github workflows.pre-commit in a fresh conda environment.pre-commit install, followed by pre-commit run --all-files.Once all this is done, you should be able to push the commits without any test failures.