PatchPinpointer: Enhancing Security Patch Localization through Hard Negative Mining and Vulnerability Representation Learning
PatchPinpointer, an approach that integrates hard negative mining with vulnerability-specific representation learning. PatchPinpointer synthesizes lexical similarity measurement and security-fix probability estimation to mine commits that are both textually relevant and characteristic of security patches. Subsequently, it employs a dual-encoder architecture with CVE-guided cross-attention and gating mechanisms to generate distinct commit representations, applying contrastive learning to establish precise mappings between CVEs and patches.
First, measure the lexical similarity:
cd code
python tf_idf_similarity.py
Second, estimate security-fix probability, you should train the SPD model first:
sh run_spd.sh
Third, filter hard negative samples:
python filter_topk.py
sh run_pretrain.sh