📆[2025-11-17] We are pleased to announce our latest work, PACEbench: A Framework for Evaluating Practical AI Cyber-Exploitation Capabilities, a comprehensive benchmark for evaluating LLM-based agent capabilities in real-world cybersecurity.
📆[2025-11-17] We have updated the related papers up to 2025/08/31, with 176 new papers added (2025.03.01-2025.08.31).
📆[2025-03-03] We have updated the related papers up to 2025/02/28, with 33 new papers added (2025.01.01-2025.02.28).
📆[2025-01-21] We have updated the related papers up to 2024/12/31, with 74 new papers added (2024.09.01-2024.12.31).
📆[2025-01-08] We have included the publication venues for each paper.
📆[2024-09-21] We have updated the related papers up to 2024/08/31, with 75 new papers added (2024.06.01-2024.08.31).
- When LLMs Meet Cybersecurity: A Systematic Literature Review
- 🔥 Updates
- 🌈 Introduction
- 🚩 Features
- 📜 Literatures
- 📖 BibTeX
- ⭐ Star History
We are excited to present "When LLMs Meet Cybersecurity: A Systematic Literature Review," a comprehensive overview of LLM applications in cybersecurity.
We seek to address three key questions:
- RQ1: How to construct cyber security-oriented domain LLMs?
- RQ2: What are the potential applications of LLMs in cybersecurity?
- RQ3: What are the existing challenges and further research directions about the application of LLMs in cybersecurity?
(2024.08.20) Our study encompasses an analysis of over 300 works, spanning across 25+ LLMs and more than 10 downstream scenarios.
-
AICrypto: A Comprehensive Benchmark For Evaluating Cryptography Capabilities of Large Language Models | arxiv | 2025.07.13 | Paper Link
-
ExCyTIn-Bench: Evaluating LLM agents on Cyber Threat Investigation | arxiv | 2025.07.14 | Paper Link
-
DefenderBench: A Toolkit for Evaluating Language Agents in Cybersecurity Environments | arxiv | 2025.06.10 | Paper Link
-
CyberGym: Evaluating AI Agents Cybersecurity Capabilities with Real-World Vulnerabilities at Scale | arxiv | 2025.06.03 | Paper Link
-
DFIR-Metric: A Benchmark Dataset for Evaluating Large Language Models in Digital Forensics and Incident Response | arxiv | 2025.05.26 | Paper Link
-
VADER: A Human-Evaluated Benchmark for Vulnerability Assessment, Detection, Explanation, and Remediation | arxiv | 2025.05.26 | Paper Link
-
BinMetric: A Comprehensive Binary Analysis Benchmark for Large Language Models | arxiv | 2025.05.12 | Paper Link
-
The Digital Cybersecurity Expert: How Far Have We Come? | arxiv | 2025.04.16 | Paper Link
-
On Benchmarking Code LLMs for Android Malware Analysis | arxiv | 2025.04.01 | Paper Link
-
CVE-Bench: A Benchmark for AI Agents Ability to Exploit Real-World Web Application Vulnerabilities | arxiv | 2025.03.21 | Paper Link
-
Benchmarking LLMs and LLM-based Agents in Practical Vulnerability Detection for Code Repositories | arxiv | 2025.03.05 | Paper Link
-
AttackSeqBench: Benchmarking Large Language Models Understanding of Sequential Patterns in Cyber Attacks | arxiv | 2025.03.05 | Paper Link
-
CyberLLMInstruct: A New Dataset for Analysing Safety of Fine-Tuned LLMs Using Cyber Security Data | arxiv | 2025.03.12 | Paper Link
-
Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training | arXiv | 2025.02.16 | Paper Link
-
ITBench: Evaluating AI Agents across Diverse Real-World IT Automation Tasks | arXiv | 2025.02.07 | Paper Link
-
SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity | arXiv | 2024.12.31 | Paper Link
-
AI Cyber Risk Benchmark: Automated Exploitation Capabilities | arXiv | 2024.12.09 | Paper Link
-
CS-Eval: A Comprehensive Large Language Model Benchmark for CyberSecurity | arXiv | 2024.11.25 | Paper Link
-
AttackER: Towards Enhancing Cyber-Attack Attribution with a Named Entity Recognition Dataset | arXiv | 2024.08.09 | Paper Link
-
CYBERSECEVAL 3: Advancing the Evaluation of Cybersecurity Risks and Capabilities in Large Language Models | arXiv | 2024.08.03 | Paper Link
-
eyeballvul: a future-proof benchmark for vulnerability detection in the wild | arXiv | 2024.07.11 | Paper Link
-
NYU CTF Dataset: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security | arXiv | 2024.06.09 | Paper Link
-
SECURE: Benchmarking Generative Large Language Models for Cybersecurity Advisory | arXiv | 2024.05.30 | Paper Link
-
Assessing Cybersecurity Vulnerabilities in Code Large Language Models | arXiv | 2024.04.29 | Paper Link
-
Can LLMs Understand Computer Networks? Towards a Virtual System Administrator | arXiv | 2024.04.22 | Paper Link
-
LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations | IEEE/ACM International Conference on Mining Software Repositories | 2023.03.16 | Paper Link
-
OpsEval: A Comprehensive IT Operations Benchmark Suite for Large Language Models | arXiv | 2024.02.16 | Paper Link
-
Can llms patch security issues? | arXiv | 2024.02.19 | Paper Link
-
CyberMetric: A Benchmark Dataset for Evaluating Large Language Models Knowledge in Cybersecurity | arXiv | 2024.02.12 | Paper Link
-
DebugBench: Evaluating Debugging Capability of Large Language Models | ACL Findings | 2024.01.11 | Paper Link
-
Securityeval dataset: mining vulnerability examples to evaluate machine learning-based code generation techniques. | Proceedings of the 1st International Workshop on Mining Software Repositories Applications for Privacy and Security | 2022.11.09 | Paper Link
-
SecQA: A Concise Question-Answering Dataset for Evaluating Large Language Models in Computer Security | arXiv | 2023.12.26 | Paper Link
-
Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models | arXiv | 2023.12.07 | Paper Link
-
An empirical study of netops capability of pre-trained large language models. | arXiv | 2023.09.19 | Paper Link
-
SecEval: A Comprehensive Benchmark for Evaluating Cybersecurity Knowledge of Foundation Models | Github | 2023 | Paper Link
-
Llama-3.1-FoundationAI-SecurityLLM-8B-Instruct Technical Report | arxiv | 2025.08.01 | Paper Link
-
Cyber-Zero: Training Cybersecurity Agents without Runtime | arxiv | 2025.07.29 | Paper Link
-
PhishIntentionLLM: Uncovering Phishing Website Intentions through Multi-Agent Retrieval-Augmented Generation | arxiv | 2025.07.21 | Paper Link
-
Less Data, More Security: Advancing Cybersecurity LLMs Specialization via Resource-Efficient Domain-Adaptive Continuous Pre-training with Minimal Tokens | arxiv | 2025.06.30 | Paper Link
-
Large Language Model-driven Security Assistant for Internet of Things via Chain-of-Thought | arxiv | 2025.05.08 | Paper Link
-
Llama-3.1-FoundationAI-SecurityLLM-Base-8B Technical Report | arxiv | 2025.04.28 | Paper Link
-
TrafficLLM: Enhancing Large Language Models for Network Traffic Analysis with Generic Traffic Representation | arxiv | 2025.04.05 | Paper Link
-
CyberBOT: Towards Reliable Cybersecurity Education via Ontology-Grounded Retrieval Augmented Generation | arxiv | 2025.04.01 | Paper Link
-
Phishsense-1B: A Technical Perspective on an AI-Powered Phishing Detection Model | arxiv | 2025.03.14 | Paper Link
-
ELTEX: A Framework for Domain-Driven Synthetic Data Generation | arXiv | 2025.03.19 | Paper Link
-
Fine-tuning Large Language Models for DGA and DNS Exfiltration Detection | arXiv | 2024.11.07 | Paper Link
-
AttackQA: Development and Adoption of a Dataset for Assisting Cybersecurity Operations using Fine-tuned and Open-Source LLMs | arXiv | 2024.11.02 | Paper Link
-
Hackphyr: A Local Fine-Tuned LLM Agent for Network Security Environments | arXiv | 2024.09.17 | Paper Link
-
CyberPal.AI: Empowering LLMs with Expert-Driven Cybersecurity Instructions | arXiv | 2024.08.18 | Paper Link
-
IoT-LM: Large Multisensory Language Models for the Internet of Things | arXiv | 2024.07.13 | Paper Link
-
A Comprehensive Evaluation of Parameter-Efficient Fine-Tuning on Automated Program Repair | arXiv | 2024.06.09 | Paper Link
-
Security Vulnerability Detection with Multitask Self-Instructed Fine-Tuning of Large Language Models | arXiv | 2024.06.09 | Paper Link
-
Transforming Computer Security and Public Trust Through the Exploration of Fine-Tuning Large Language Models | arXiv | 2024.06.02 | Paper Link
-
Assessing LLMs in Malicious Code Deobfuscation of Real-world Malware Campaigns | arXiv | 2024.04.30 | Paper Link
-
Nova+: Generative Language Models for Binaries | arXiv | 2023.11.27 | Paper Link
-
Instruction Tuning for Secure Code Generation | ICML | 2024.02.14 | Paper Link
-
Efficient Avoidance of Vulnerabilities in Auto-completed Smart Contract Code Using Vulnerability-constrained Decoding | ISSRE | 2023.10.06 | Paper Link
-
RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair | arXiv | 2024.03.11 | Paper Link
-
Finetuning Large Language Models for Vulnerability Detection | arXiv | 2024.02.29 | Paper Link
-
Large Language Models for Test-Free Fault Localization | ICSE | 2023.10.03 | Paper Link
-
HackMentor: Fine-tuning Large Language Models for Cybersecurity | TrustCom | 2023.09 | Paper Link
-
Owl: A Large Language Model for IT Operations | ICLR | 2023.09.17 | Paper Link
-
SecureFalcon: The Next Cyber Reasoning System for Cyber Security | arXiv | 2023.07.13 | Paper Link
-
DroidTTP: Mapping Android Applications with TTP for Cyber Threat Intelligence | arxiv | 2025.03.20 | Paper Link
-
A Systematic Approach to Predict the Impact of Cybersecurity Vulnerabilities Using LLMs | arxiv | 2025.08.25 | Paper Link
-
Enabling Transparent Cyber Threat Intelligence Combining Large Language Models and Domain Ontologies | arxiv | 2025.08.27 | Paper Link
-
False Alarms, Real Damage: Adversarial Attacks Using LLM-based Models on Text-based Cyber Threat Intelligence Systems | arxiv | 2025.07.05 | Paper Link
-
LRCTI: A Large Language Model-Based Framework for Multi-Step Evidence Retrieval and Reasoning in Cyber Threat Intelligence Credibility Verification | arxiv | 2025.07.15 | Paper Link
-
Towards Effective Identification of Attack Techniques in Cyber Threat Intelligence Reports using Large Language Models | arxiv | 2025.05.05 | Paper Link
-
Can We Enhance Bug Report Quality Using LLMs?: An Empirical Study of LLM-Based Bug Report Generation | arxiv | 2025.04.26 | Paper Link
-
MaLAware: Automating the Comprehension of Malicious Software Behaviours using Large Language Models (LLMs) | arxiv | 2025.04.01 | Paper Link
-
LLM-Assisted Proactive Threat Intelligence for Automated Reasoning | arxiv | 2025.04.01 | Paper Link
-
Large Language Models are Unreliable for Cyber Threat Intelligence | arxiv | 2025.03.29 | Paper Link
-
Cyber Defense Reinvented: Large Language Models as Threat Intelligence Copilots | arXiv | 2025.02.28 | Paper Link
-
Labeling NIDS Rules with MITRE ATT&CK Techniques: Machine Learning vs. Large Language Models | arXiv | 2024.12.16 | Paper Link
-
IntellBot: Retrieval Augmented LLM Chatbot for Cyber Threat Knowledge Delivery | arXiv | 2024.11.08| Paper Link
-
CTINEXUS: Leveraging Optimized LLM In-Context Learning for Constructing Cybersecurity Knowledge Graphs Under Data Scarcity | arXiv | 2024.10.28 | Paper Link
-
AI-Driven Cyber Threat Intelligence Automation | arXiv | 2024.10.27 | Paper Link
-
Cyber Knowledge Completion Using Large Language Models | arXiv | 2024.09.24 | Paper Link
-
Evaluating the Usability of LLMs in Threat Intelligence Enrichment | arXiv | 2024.09.23 | Paper Link
-
KGV: Integrating Large Language Models with Knowledge Graphs for Cyber Threat Intelligence Credibility Assessment | arXiv | 2024.08.15 | Paper Link
-
Usefulness of data flow diagrams and large language models for security threat validation: a registered report | arXiv | 2024.08.14 | Paper Link
-
A RAG-Based Question-Answering Solution for Cyber-Attack Investigation and Attribution | arXiv | 2024.08.12 | Paper Link
-
The Use of Large Language Models (LLM) for Cyber Threat Intelligence (CTI) in Cybercrime Forums | arXiv | 2024.08.08 | Paper Link
-
Psychological Profiling in Cybersecurity: A Look at LLMs and Psycholinguistic Features | arXiv | 2024.08.09 | Paper Link
-
Using LLMs to Automate Threat Intelligence Analysis Workflows in Security Operation Centers | arXiv | 2024.07.18 | Paper Link
-
LLMCloudHunter: Harnessing LLMs for Automated Extraction of Detection Rules from Cloud-Based CTI | arXiv | 2024.07.06 | Paper Link
-
Actionable Cyber Threat Intelligence using Knowledge Graphs and Large Language Models | arXiv | 2024.06.30 | Paper Link
-
AttacKG+:Boosting Attack Knowledge Graph Construction with Large Language Models | EuroS&P Workshop | 2024.05.08 | Paper Link
-
SEvenLLM: Benchmarking, Eliciting, and Enhancing Abilities of Large Language Models in Cyber Threat Intelligence | arXiv | 2024.05.06 | Paper Link
-
Crimson: Empowering Strategic Reasoning in Cybersecurity through Large Language Models | arXiv | 2024.03.01 | Paper Link
-
Evaluation of LLM Chatbots for OSINT-based Cyber Threat Awareness | Expert Syst. Appl. | 2024.03.13 | Paper Link
-
LOCALINTEL: Generating Organizational Threat Intelligence from Global and Local Cyber Knowledge | arXiv | 2024.01.18 | Paper Link
-
Advancing TTP Analysis: Harnessing the Power of Encoder-Only and Decoder-Only Language Models with Retrieval Augmented Generation | arXiv | 2024.01.12 | Paper Link
-
ChatGPT, Llama, can you write my report? An experiment on assisted digital forensics reports written using (Local) Large Language Models | Forensic Sci. Int. Digit. Investig. | 2023.12.22 | Paper Link
-
HW-V2W-Map: Hardware Vulnerability to Weakness Mapping Framework for Root Cause Analysis with GPT-assisted Mitigation Suggestion | arXiv | 2023.12.21 | Paper Link
-
AGIR: Automating Cyber Threat Intelligence Reporting with Natural Language Generation | BigData | 2023.10.04 | Paper Link
-
Cyber Sentinel: Exploring Conversational Agents in Streamlining Security Tasks with GPT-4 | arXiv | 2023.09.28 | Paper Link
-
Cupid: Leveraging ChatGPT for More Accurate Duplicate Bug Report Detection | arXiv | 2023.08.27 | Paper Link
-
On the Uses of Large Language Models to Interpret Ambiguous Cyberattack Descriptions | arXiv | 2023.08.22 | Paper Link
-
An Empirical Study on Using Large Language Models to Analyze Software Supply Chain Security Failures | Proceedings of the 2023 Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses | 2023.08.09 | Paper Link
-
Time for aCTIon: Automated Analysis of Cyber Threat Intelligence in the Wild | arXiv | 2023.07.14 | Paper Link
-
LLM-Assisted Model-Based Fuzzing of Protocol Implementations | arxiv | 2025.08.03 | Paper Link
-
Fuzzing: Randomness? Reasoning! Efficient Directed Fuzzing via Large Language Models | arxiv | 2025.06.30 | Paper Link
-
Directed Greybox Fuzzing via Large Language Model | arxiv | 2025.05.06 | Paper Link
-
ToolFuzz -- Automated Agent Tool Testing | arxiv | 2025.03.06 | Paper Link
-
Towards Reliable LLM-Driven Fuzz Testing: Vision and Road Ahead | arxiv | 2025.03.02 | Paper Link
-
Your Fix Is My Exploit: Enabling Comprehensive DL Library API Fuzzing with Large Language Models | arXiv | 2025.01.08 | Paper Link
-
Large Language Model assisted Hybrid Fuzzing | arXiv | 2024.12.19 | Paper Link
-
Harnessing Large Language Models for Seed Generation in Greybox Fuzzing | arXiv | 2024.11.27 | Paper Link
-
ChatHTTPFuzz: Large Language Model-Assisted IoT HTTP Fuzzing | arXiv | 2024.11.18 | Paper Link
-
AutoSafeCoder: A Multi-Agent Framework for Securing LLM Code Generation through Static Analysis and Fuzz Testing | arXiv | 2024.11.05 | Paper Link
-
FuzzCoder: Byte-level Fuzzing Test via Large Language Model | arXiv | 2024.09.03 | Paper Link
-
An Exploratory Study on Using Large Language Models for Mutation Testing | arXiv | 2024.06.14 | Paper Link
-
Prompt Fuzzing for Fuzz Driver Generation | ACM CCS 2024 | 2024.05.29 | Paper Link
-
When Fuzzing Meets LLMs: Challenges and Opportunities | ACM International Conference on the Foundations of Software Engineering | 2024.04.25 | Paper Link
-
Fuzzing BusyBox: Leveraging LLM and Crash Reuse for Embedded Bug Unearthing | USENIX | 2024.03.06 | Paper Link
-
Large language model guided protocol fuzzing | NDSS | 2024.02.26 | Paper Link
-
Fuzz4All: Universal Fuzzing with Large Language Models | ICSE | 2024.01.15 | Paper Link
-
How well does LLM generate security tests? | arXiv | 2023.10.03 | Paper Link
-
CODAMOSA: Escaping Coverage Plateaus in Test Generation with Pre-trained Large Language Models | ICSE | 2023.07.26 | Paper Link
-
Understanding Large Language Model Based Fuzz Driver Generation | arXiv | 2023.07.24 | Paper Link
-
Large Language Models Are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models | ISSTA | 2023.06.07 | Paper Link
-
Augmenting Greybox Fuzzing with Generative AI | arXiv | 2023.06.11 | Paper Link
-
Large Language Models are Edge-Case Fuzzers: Testing Deep Learning Libraries via FuzzGPT | arXiv | 2023.04.04 | Paper Link
-
SEC-bench: Automated Benchmarking of LLM Agents on Real-World Software Security Tasks | arxiv | 2025.06.13 | Paper Link
-
Large Language Models Versus Static Code Analysis Tools: A Systematic Benchmark for Vulnerability Detection | arxiv | 2025.08.06 | Paper Link
-
A Systematic Literature Review on Detecting Software Vulnerabilities with Large Language Models | arxiv | 2025.07.30 | Paper Link
-
Out of Distribution, Out of Luck: How Well Can LLMs Trained on Vulnerability Datasets Detect Top 25 CWE Weaknesses? | arxiv | 2025.07.29 | Paper Link
-
LLMxCPG: Context-Aware Vulnerability Detection Through Code Property Graph-Guided Large Language Models | USENIX | 2025.07.22 | Paper Link
-
Revisiting Pre-trained Language Models for Vulnerability Detection | arxiv | 2025.07.22 | Paper Link
-
MalCodeAI: Autonomous Vulnerability Detection and Remediation via Language Agnostic Code Reasoning | arxiv | 2025.07.15 | Paper Link
-
Identifying Helpful Context for LLM-based Vulnerability Repair: A Preliminary Study | arxiv | 2025.06.13 | Paper Link
-
VulStamp: Vulnerability Assessment using Large Language Model | arxiv | 2025.06.13 | Paper Link
-
Large Language Models for Multilingual Vulnerability Detection: How Far Are We? | arxiv | 2025.06.09 | Paper Link
-
Boosting Vulnerability Detection of LLMs via Curriculum Preference Optimization with Synthetic Reasoning Data | arxiv | 2025.06.09 | Paper Link
-
Let the Trial Begin: A Mock-Court Approach to Vulnerability Detection using LLM-Based Agents | arxiv | 2025.05.16 | Paper Link
-
A Preliminary Study of Large Language Models for Multilingual Vulnerability Detection | arxiv | 2025.05.12 | Paper Link
-
Enhancing Large Language Models with Faster Code Preprocessing for Vulnerability Detection | arxiv | 2025.05.08 | Paper Link
-
LASHED: LLMs And Static Hardware Analysis for Early Detection of RTL Bugs | arxiv | 2025.04.30 | Paper Link
-
LLMpatronous: Harnessing the Power of LLMs For Vulnerability Detection | arxiv | 2025.04.25 | Paper Link
-
Context-Enhanced Vulnerability Detection Based on Large Language Model | arxiv | 2025.04.23 | Paper Link
-
Automated Static Vulnerability Detection via a Holistic Neuro-symbolic Approach | arxiv | 2025.04.22 | Paper Link
-
Everything You Wanted to Know About LLM-based Vulnerability Detection But Were Afraid to Ask | arxiv | 2025.04.18 | Paper Link
-
MOS: Towards Effective Smart Contract Vulnerability Detection through Mixture-of-Experts Tuning of Large Language Models | arxiv | 2025.04.16 | Paper Link
-
Malware analysis assisted by AI with R2AI | arxiv | 2025.04.10 | Paper Link
-
Large Language Model (LLM) for Software Security: Code Analysis, Malware Analysis, Reverse Engineering | arxiv | 2025.04.08 | Paper Link
-
CVE-Bench: Benchmarking LLM-based Software Engineering Agent`s Ability to Repair Real-World CVE Vulnerabilities | NAACL | 2025.03 | Paper Link
-
Reasoning with LLMs for Zero-Shot Vulnerability Detection | arxiv | 2025.03.22 | Paper Link
-
Vulnerability Detection: From Formal Verification to Large Language Models and Hybrid Approaches: A Comprehensive Overview | arxiv | 2025.03.13 | Paper Link
-
CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs towards CWE Detection | arxiv | 2025.03.12 | Paper Link
-
Benchmarking Large Language Models for Multi-Language Software Vulnerability Detection | arxiv | 2025.03.03 | Paper Link
-
CVE-LLM : Ontology-Assisted Automatic Vulnerability Evaluation Using Large Language Models | arXiv | 2025.02.21 | Paper Link
-
Large Language Models in Software Security: A Survey of Vulnerability Detection Techniques and Insights | arXiv | 2025.02.10 | Paper Link
-
Large Language Models for In-File Vulnerability Localization Can Be "Lost in the End" | arXiv | 2025.02.09 | Paper Link
-
Streamlining Security Vulnerability Triage with Large Language Models | arXiv | 2025.01.31 | Paper Link
-
Evaluating Large Language Models in Vulnerability Detection Under Variable Context Windows | arXiv | 2025.01.30 | Paper Link
-
Helping LLMs Improve Code Generation Using Feedback from Testing and Static Analysis | arXiv | 2025.01.07 | Paper Link
-
CGP-Tuning: Structure-Aware Soft Prompt Tuning for Code Vulnerability Detection | arXiv | 2025.01.08 | Paper Link
-
Leveraging Large Language Models and Machine Learning for Smart Contract Vulnerability Detection | arXiv | 2025.01.04 | Paper Link
-
Investigating Large Language Models for Code Vulnerability Detection: An Experimental Study | arXiv | 2024.12.24 | Paper Link
-
Can LLM Prompting Serve as a Proxy for Static Analysis in Vulnerability Detection | arXiv | 2024.12.16 | Paper Link
-
ChatNVD: Advancing Cybersecurity Vulnerability Assessment with Large Language Models | arXiv | 2024.12.06 | Paper Link
-
CleanVul: Automatic Function-Level Vulnerability Detection in Code Commits Using LLM Heuristics | arXiv | 2024.11.26 | Paper Link
-
EnStack: An Ensemble Stacking Framework of Large Language Models for Enhanced Vulnerability Detection in Source Code | arXiv | 2024.11.25 | Paper Link
-
CryptoFormalEval: Integrating LLMs and Formal Verification for Automated Cryptographic Protocol Vulnerability Detection | arXiv | 2024.11.20 | Paper Link
-
Beyond Static Tools: Evaluating Large Language Models for Cryptographic Misuse Detection | arXiv | 2024.11.14 | Paper Link
-
LProtector: An LLM-driven Vulnerability Detection System | arXiv | 2024.11.04 | Paper Link
-
Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries | arXiv | 2024.11.07 | Paper Link
-
ProveRAG: Provenance-Driven Vulnerability Analysis with Automated Retrieval-Augmented LLMs | arXiv | 2024.10.22 | Paper Link
-
RealVul: Can We Detect Vulnerabilities in Web Applications with LLM? | arXiv | 2024.10.10 | Paper Link
-
Code Vulnerability Repair with Large Language Model using Context-Aware Prompt Tuning | arXiv | 2024.09.27 | Paper Link
-
Boosting Cybersecurity Vulnerability Scanning based on LLM-supported Static Application Security Testing | arXiv | 2024.09.24 | Paper Link
-
VulnLLMEval: A Framework for Evaluating Large Language Models in Software Vulnerability Detection and Patching | arXiv | 2024.09.17 | Paper Link
-
Code Vulnerability Detection: A Comparative Analysis of Emerging Large Language Models | arXiv | 2024.09.16 | Paper Link
-
Exploring LLMs for Malware Detection: Review, Framework Design, and Countermeasure Approaches | arXiv | 2024.09.11 | Paper Link
-
SAFE: Advancing Large Language Models in Leveraging Semantic and Syntactic Relationships for Software Vulnerability Detection | arXiv | 2024.09.02 | Paper Link
-
Outside the Comfort Zone: Analysing LLM Capabilities in Software Vulnerability Detection | European symposium on research in computer security | 2024.08.29 | Paper Link
-
ANVIL: Anomaly-based Vulnerability Identification without Labelled Training Data | arXiv | 2024.08.28 | Paper Link
-
LLM-Enhanced Static Analysis for Precise Identification of Vulnerable OSS Versions | arXiv | 2024.08.14 | Paper Link
-
Exploring RAG-based Vulnerability Augmentation with LLMs | arXiv | 2024.08.08 | Paper Link
-
Harnessing the Power of LLMs in Source Code Vulnerability Detection | arXiv | 2024.08.07 | Paper Link
-
Towards Effectively Detecting and Explaining Vulnerabilities Using Large Language Models | arXiv | 2024.08.08 | Paper Link
-
Comparison of Static Application Security Testing Tools and Large Language Models for Repo-level Vulnerability Detection | arXiv | 2024.07.23 | Paper Link
-
SCoPE: Evaluating LLMs for Software Vulnerability Detection | arXiv | 2024.07.19 | Paper Link
-
Static Detection of Filesystem Vulnerabilities in Android Systems | arXiv | 2024.07.16 | Paper Link
-
Detect Llama -- Finding Vulnerabilities in Smart Contracts using Large Language Models | Information Security and Privacy | 2024.07.12 | Paper Link
-
Assessing the Effectiveness of LLMs in Android Application Vulnerability Analysis | arXiv | 2024.06.27 | Paper Link
-
MALSIGHT: Exploring Malicious Source Code and Benign Pseudocode for Iterative Binary Malware Summarization | arXiv | 2024.06.26 | Paper Link
-
Vul-RAG: Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG | arXiv | 2024.06.19 | Paper Link
-
Generalization-Enhanced Code Vulnerability Detection via Multi-Task Instruction Fine-Tuning | ACL Findings | 2024.06.06 | Paper Link
-
LLM-Assisted Static Analysis for Detecting Security Vulnerabilities | arXiv | 2024.05.27 | Paper Link
-
Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study | arXiv | 2024.05.24 | Paper Link
-
DLAP: A Deep Learning Augmented Large Language Model Prompting Framework for Software Vulnerability Detection | Journal of Systems and Software | 2024.05.02 | Paper Link
-
Large Language Model for Vulnerability Detection and Repair: Literature Review and Roadmap | arXiv | 2024.04.04 | Paper Link
-
How Far Have We Gone in Vulnerability Detection Using Large Language Models | arXiv | 2023.12.22 | Paper Link
-
The FormAI Dataset: Generative AI in Software Security through the Lens of Formal Verification | International Conference on Predictive Models and Data Analytics in Software Engineering | 2023.09.02 | Paper Link
-
DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection | International Symposium on Research in Attacks, Intrusions and Defenses | 2023.08.09 | Paper Link
-
How ChatGPT is Solving Vulnerability Management Problem | arXiv | 2023.11.11 | Paper Link
-
Multi-role Consensus through LLMs Discussions for Vulnerability Detection | arXiv | 2024.03.21 | Paper Link
-
LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs' Vulnerability Reasoning | arXiv | 2024.01.29 | Paper Link
-
LLbezpeky: Leveraging Large Language Models for Vulnerability Detection | arXiv | 2024.01.13 | Paper Link
-
Software Vulnerability Detection with GPT and In-Context Learning | DSC | 2024.01.08 | Paper Link
-
GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis | ICSE | 2023.12.25 | Paper Link
-
Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities | arXiv | 2023.11.16 | Paper Link
-
The Hitchhiker's Guide to Program Analysis: A Journey with Large Language Models | arXiv | 2023.11.15 | Paper Link
-
Large Language Model-Powered Smart Contract Vulnerability Detection: New Perspectives | TPS-ISA | 2023.10.16 | Paper Link
-
Large Language Models for Test-Free Fault Localization | ICSE | 2023.10.03 | Paper Link
-
DefectHunter: A Novel LLM-Driven Boosted-Conformer-based Code Vulnerability Detection Mechanism | arXiv | 2023.09.27 | Paper Link
-
Software Vulnerability Detection using Large Language Models | ISSRE Workshop | 2023.09.02 | Paper Link
-
Using ChatGPT as a Static Application Security Testing Tool | arXiv | 2023.08.28 | Paper Link
-
Prompt-Enhanced Software Vulnerability Detection Using ChatGPT | ICSE | 2023.08.24 | Paper Link
-
VulLibGen: Identifying Vulnerable Third-Party Libraries via Generative Pre-Trained Model | arXiv | 2023.08.09 | Paper Link
-
Evaluation of ChatGPT Model for Vulnerability Detection | arXiv | 2023.04.12 | Paper Link
-
Software Vulnerability and Functionality Assessment using LLMs | arXiv | 2024.03.13 | Paper Link
-
Finetuning Large Language Models for Vulnerability Detection | arXiv | 2024.03.01 | Paper Link
-
Detecting software vulnerabilities using Language Models | CSR | 2023.02.23 | Paper Link
Since this part has evolved to focus more on Code LLM research, it is no longer actively maintained.
-
Benchmarking Prompt Engineering Techniques for Secure Code Generation with GPT Models | arXiv | 2025.02.09 | Paper Link
-
ContractTinker: LLM-Empowered Vulnerability Repair for Real-World Smart Contracts | arXiv | 2024.09.15 | Paper Link
-
An Exploratory Study on Fine-Tuning Large Language Models for Secure Code Generation | arXiv | 2024.08.17 | Paper Link
-
Is Your AI-Generated Code Really Safe? Evaluating Large Language Models on Secure Code Generation with CodeSecEval | arXiv | 2024.07.04 | Paper Link
-
DistiLRR: Transferring Code Repair for Low-Resource Programming Languages | arXiv | 2024.06.20 | Paper Link
-
Code Repair with LLMs gives an Exploration-Exploitation Tradeoff | arXiv | 2024.05.30 | Paper Link
-
LLM Security Guard for Code | International Conference on Evaluation and Assessment in Software Engineering | 2024.05.03 | Paper Link
-
Do Neutral Prompts Produce Insecure Code? FormAI-v2 Dataset: Labelling Vulnerabilities in Code Generated by Large Language Models | arXiv | 2024.04.29 | Paper Link
-
Evolutionary Large Language Models for Hardware Security: A Comparative Survey | arXiv | 2024.04.25 | Paper Link
-
FLAG: Finding Line Anomalies (in code) with Generative AI | arXiv | 2023.07.22 | Paper Link
-
Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI Testing via Functionality-aware Decisions | ICSE | 2023.10.24 | Paper Link
-
DebugBench: Evaluating Debugging Capability of Large Language Models | ACL Findings | 2024.01.11 | Paper Link
-
Shifting the Lens: Detecting Malware in npm Ecosystem with Large Language Models | arXiv | 2024.03.18 | Paper Link
-
Using ChatGPT to Analyze Ransomware Messages and to Predict Ransomware Threats | Research Square | 2023.11.21 | Paper Link
-
Prompt Engineering-assisted Malware Dynamic Analysis Using GPT-4 | arXiv | 2023.12.13 | Paper Link
-
Evaluating and Explaining Large Language Models for Code Using Syntactic Structures | arXiv | 2023.08.07 | Paper Link
-
Understanding Programs by Exploiting (Fuzzing) Test Cases | ACL Findings | 2023.01.12 | Paper Link
-
Large Language Models for Code Analysis: Do LLMs Really Do Their Job? | USENIX | 2024.03.05 | Paper Link
-
LLM4Decompile: Decompiling Binary Code with Large Language Models | EMNLP | 2024.03.08 | Paper Link
-
Pop Quiz! Can a Large Language Model Help With Reverse Engineering? | arXiv | 2022.02.02 | Paper Link
-
Large Language Models for Code: Security Hardening and Adversarial Testing | ACM SIGSAC Conference on Computer and Communications Security | 2023.09.29 | Paper Link
-
How Secure is Code Generated by ChatGPT? | SMC | 2023.04.19 | Paper Link
-
A Comparative Study of Code Generation using ChatGPT 3.5 across 10 Programming Languages | arXiv | 2023.08.08 | Paper Link
-
Can Large Language Models Identify And Reason About Security Vulnerabilities? Not Yet | arXiv | 2023.12.19 | Paper Link
-
Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation | NeurIPS | 2023.10.30 | Paper Link
-
Generate and Pray: Using SALLMS to Evaluate the Security of LLM Generated Code | arXiv | 2023.11.01 | Paper Link
-
No Need to Lift a Finger Anymore? Assessing the Quality of Code Generation by ChatGPT | IEEE Trans. Software Eng. | 2023.08.09 | Paper Link
-
The Effectiveness of Large Language Models (ChatGPT and CodeBERT) for Security-Oriented Code Analysis | arXiv | 2023.08.29 | Paper Link
-
Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions | S&P | 2021.12.16 | Paper Link
-
Bugs in Large Language Models Generated Code | arXiv | 2024.03.18 | Paper Link
-
Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants | USENIX | 2023.02.27 | Paper Link
-
VulnRepairEval: An Exploit-Based Evaluation Framework for Assessing Large Language Model Vulnerability Repair Capabilities | arXiv | 2025.09.03 | Paper Link
-
Automated Repair of C Programs Using Large Language Models | arXiv | 2025.09.02 | Paper Link
-
On the Evaluation of Large Language Models in Multilingual Vulnerability Repair | arXiv | 2025.08.05 | Paper Link
-
Repair-R1: Better Test Before Repair | arXiv | 2025.07.30 | Paper Link
-
Repairing vulnerabilities without invisible hands. A differentiated replication study on LLMs | arXiv | 2025.07.28 | Paper Link
-
The Impact of Fine-tuning Large Language Models on Automated Program Repair | arXiv | 2025.07.26 | Paper Link
-
Bug Fixing with Broader Context: Enhancing LLM-Based Program Repair via Layered Knowledge Injection | arXiv | 2025.06.30 | Paper Link
-
Repair Ingredients Are All You Need: Improving Large Language Model-Based Program Repair via Repair Ingredients Search | arXiv | 2025.06.30 | Paper Link
-
A Survey of LLM-based Automated Program Repair: Taxonomies, Design Paradigms, and Applications | arXiv | 2025.06.30 | Paper Link
-
Empirical Evaluation of Generalizable Automated Program Repair with Large Language Models | arXiv | 2025.06.03 | Paper Link
-
Boosting Open-Source LLMs for Program Repair via Reasoning Transfer and LLM-Guided Reinforcement Learning | arXiv | 2025.06.04 | Paper Link
-
Fixing 7,400 Bugs for 1$: Cheap Crash-Site Program Repair | arXiv | 2025.05.19 | Paper Link
-
Adversarial Reasoning for Repair Based on Inferred Program Intent | arXiv | 2025.05.19 | Paper Link
-
Synthetic Code Surgery: Repairing Bugs and Vulnerabilities with LLMs and Synthetic Data | arXiv | 2025.05.12 | Paper Link
-
Automated Repair of Ambiguous Natural Language Requirements | arXiv | 2025.05.12 | Paper Link
-
Towards Effectively Leveraging Execution Traces for Program Repair with Code LLMs | arXiv | 2025.05.07 | Paper Link
-
The Art of Repair: Optimizing Iterative Program Repair with Instruction-Tuned Models | arXiv | 2025.05.05 | Paper Link
-
Adapting Knowledge Prompt Tuning for Enhanced Automated Program Repair | arXiv | 2025.04.02 | Paper Link
-
LLM4CVE: Enabling Iterative Automated Vulnerability Repair with Large Language Models | arXiv | 2025.01.07 | Paper Link
-
From Defects to Demands: A Unified, Iterative, and Heuristically Guided LLM-Based Framework for Automated Software Repair and Requirement Realization | arXiv | 2024.12.06 | Paper Link
-
Integrating Various Software Artifacts for Better LLM-based Bug Localization and Program Repair | arXiv | 2024.12.05 | Paper Link
-
Fixing Security Vulnerabilities with AI in OSS-Fuzz | arXiv | 2024.11.21 | Paper Link
-
A Comprehensive Survey of AI-Driven Advancements and Techniques in Automated Program Repair and Code Generation | arXiv | 2024.11.12 | Paper Link
-
The Best Defense is a Good Offense: Countering LLM-Powered Cyberattacks | arXiv | 2024.10.20 | Paper Link
-
APOLLO: A GPT-based tool to detect phishing emails and generate explanations that warn users | arXiv | 2024.10.10 | Paper Link
-
Fixing Code Generation Errors for Large Language Models | arXiv | 2024.09.01 | Paper Link
-
MergeRepair: An Exploratory Study on Merging Task-Specific Adapters in Code LLMs for Automated Program Repair | arXiv | 2024.08.26 | Paper Link
-
Automated Software Vulnerability Patching using Large Language Models | arXiv | 2024.08.24 | Paper Link
-
Enhancing LLM-Based Automated Program Repair with Design Rationales | ASE | 2024.08.22 | Paper Link
-
RePair: Automated Program Repair with Process-based Feedback | ACL Findings | 2024.08.21 | Paper Link
-
Revisiting Evolutionary Program Repair via Code Language Model | arXiv | 2024.08.20 | Paper Link
-
ThinkRepair: Self-Directed Automated Program Repair | ACM SIGSOFT International Symposium on Software Testing and Analysis | 2024.07.30 | Paper Link
-
Automated C/C++ Program Repair for High-Level Synthesis via Large Language Models | ACM/IEEE International Symposium on Machine Learning for CAD | 2024.07.04 | Paper Link
-
Hybrid Automated Program Repair by Combining Large Language Models and Program Analysis | arXiv | 2024.06.04 | Paper Link
-
A Case Study of LLM for Automated Vulnerability Repair: Assessing Impact of Reasoning and Patch Validation Feedback | Proceedings of the 1st ACM International Conference on AI-Powered Software | 2024.05.24 | Paper Link
-
Automated Repair of AI Code with Large Language Models and Formal Verification | arXiv | 2024.05.14 | Paper Link
-
A Systematic Literature Review on Large Language Models for Automated Program Repair | arXiv | 2024.05.12 | Paper Link
-
Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language Models | arXiv | 2024.03.23 | Paper Link
-
How Far Can We Go with Practical Function-Level Program Repair? | arXiv | 2024.04.19 | Paper Link
-
Multi-Objective Fine-Tuning for Enhanced Program Repair with LLMs | arXiv | 2024.04.22 | Paper Link
-
Aligning LLMs for FL-free Program Repair | arXiv | 2024.04.13 | Paper Link
-
When Large Language Models Confront Repository-Level Automatic Program Repair: How Well They Done? | ICSE | 2023.03.01 | Paper Link
-
ContrastRepair: Enhancing Conversation-Based Automated Program Repair via Contrastive Test Case Pairs | arXiv | 2024.03.07 | Paper Link
-
LLM-Powered Code Vulnerability Repair with Reinforcement Learning and Semantic Reward | arXiv | 2024.02.22 | Paper Link
-
Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair | ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering | 2023.11.08 | Paper Link
-
Better Patching Using LLM Prompting, via Self-Consistency | ASE | 2023.08.16 | Paper Link
-
Teaching Large Language Models to Self-Debug | ICLR | 2023.10.05 | Paper Link
-
Enhanced Automated Code Vulnerability Repair using Large Language Models | Eng. Appl. Artif. Intell. | 2024.01.08 | Paper Link
-
A Study of Vulnerability Repair in JavaScript Programs with Large Language Models | WWW | 2023.03.19 | Paper Link
-
Fixing Hardware Security Bugs with Large Language Models | arXiv | 2023.02.02 | Paper Link
-
DIVAS: An LLM-based End-to-End Framework for SoC Security Analysis and Policy-based Protection | arXiv | 2023.08.14 | Paper Link
-
ZeroLeak: Using LLMs for Scalable and Cost Effective Side-Channel Patching | arXiv | 2023.08.24 | Paper Link
-
InferFix: End-to-End Program Repair with LLMs | ESEC/FSE | 2023.03.13 | Paper Link
-
Can LLMs Patch Security Issues? | arXiv | 2024.02.19 | Paper Link
-
How Effective Are Neural Networks for Fixing Security Vulnerabilities | ISSTA | 2023.05.29 | Paper Link
-
Examining Zero-Shot Vulnerability Repair with Large Language Models | SP | 2022.08.15 | Paper Link
-
Security Code Review by LLMs: A Deep Dive into Responses | arXiv | 2024.01.29 | Paper Link
-
Practical Program Repair in the Era of Large Pre-trained Language Models | arXiv | 2022.10.25 | Paper Link
-
AI-powered patching: the future of automated vulnerability fixes | google | 2024.01.31 | Paper Link
-
An Analysis of the Automatic Bug Fixing Performance of ChatGPT | APR@ICSE | 2023.01.20 | Paper Link
-
Automatic Program Repair with OpenAI's Codex: Evaluating QuixBugs | arXiv | 2023.11.06 | Paper Link
-
LLM-driven Provenance Forensics for Threat Investigation and Detection | arxiv | 2025.08.29 | Paper Link
-
FALCON: Autonomous Cyber Threat Intelligence Mining with LLMs for IDS Rule Generation | arxiv | 2025.08.26 | Paper Link
-
Chimera: Harnessing Multi-Agent LLMs for Automatic Insider Threat Simulation | arxiv | 2025.08.11 | Paper Link
-
Think Broad, Act Narrow: CWE Identification with Multi-Agent Large Language Models | arxiv | 2025.08.02 | Paper Link
-
OFCnetLLM: Large Language Model for Network Monitoring and Alertness | arxiv | 2025.07.30 | Paper Link
-
Large Language Model-Based Framework for Explainable Cyberattack Detection in Automatic Generation Control Systems | arxiv | 2025.07.29 | Paper Link
-
From Alerts to Intelligence: A Novel LLM-Aided Framework for Host-based Intrusion Detection | arxiv | 2025.07.15 | Paper Link
-
Can Large Language Models Improve Phishing Defense? A Large-Scale Controlled Experiment on Warning Dialogue Explanations | arxiv | 2025.07.10 | Paper Link
-
Large Language Models for Network Intrusion Detection Systems: Foundations, Implementations, and Future Directions | arxiv | 2025.07.07 | Paper Link
-
Adaptive Linguistic Prompting (ALP) Enhances Phishing Webpage Detection in Multimodal Large Language Models | arxiv | 2025.06.29 | Paper Link
-
Leveraging Large Language Model for Intelligent Log Processing and Autonomous Debugging in Cloud AI Platforms | arxiv | 2025.06.22 | Paper Link
-
SmartGuard: Leveraging Large Language Models for Network Attack Detection through Audit Log Analysis and Summarization | arxiv | 2025.06.20 | Paper Link
-
PhishDebate: An LLM-Based Multi-Agent Framework for Phishing Website Detection | arxiv | 2025.06.18 | Paper Link
-
LLM-Powered Intent-Based Categorization of Phishing Emails | arxiv | 2025.06.17 | Paper Link
-
Evaluating Large Language Models for Phishing Detection, Self-Consistency, Faithfulness, and Explainability | arxiv | 2025.06.16 | Paper Link
-
Training RL Agents for Multi-Objective Network Defense Tasks | arxiv | 2025.06.13 | Paper Link
-
A Unified Framework for Human AI Collaboration in Security Operations Centers with Trusted Autonomy | arxiv | 2025.06.01 | Paper Link
-
MultiPhishGuard: An LLM-based Multi-Agent System for Phishing Email Detection | arxiv | 2025.05.27 | Paper Link
-
IRCopilot: Automated Incident Response with Large Language Models | arxiv | 2025.05.27 | Paper Link
-
LLM-Driven APT Detection for 6G Wireless Networks: A Systematic Review and Taxonomy | arxiv | 2025.05.24 | Paper Link
-
Benchmarking LLMs in an Embodied Environment for Blue Team Threat Hlunting | arxiv | 2025.05.17 | Paper Link
-
Automating Security Audit Using Large Language Model based Agent: An Exploration Experiment | arxiv | 2025.05.16 | Paper Link
-
On Technique Identification and Threat-Actor Attribution using LLMs and Embedding Models | arxiv | 2025.05.15 | Paper Link
-
Towards AI-Driven Human-Machine Co-Teaming for Adaptive and Agile Cyber Security Operation Centers | arxiv | 2025.05.09 | Paper Link
-
Large Language Models are Autonomous Cyber Defenders | arxiv | 2025.05.08 | Paper Link
-
Bridging Expertise Gaps: The Role of LLMs in Human-AI Collaboration for Cybersecurity | arxiv | 2025.05.06 | Paper Link
-
LLM-Based Threat Detection and Prevention Framework for IoT Ecosystems | arxiv | 2025.05.01 | Paper Link
-
Improving Phishing Email Detection Performance of Small Large Language Models | arxiv | 2025.04.29 | Paper Link
-
AnomalyGen: An Automated Semantic Log Sequence Generation Framework with LLM for Anomaly Detection | arxiv | 2025.04.16 | Paper Link
-
Investigating cybersecurity incidents using large language models in latest-generation wireless networks | arxiv | 2025.04.14 | Paper Link
-
SoK: LLM-based Log Parsing | arxiv | 2025.04.07 | Paper Link
-
Knowledge Transfer from LLMs to Provenance Analysis: A Semantic-Augmented Method for APT Detection | arxiv | 2025.03.24 | Paper Link
-
Large Language Models powered Network Attack Detection: Architecture, Opportunities and Case Study | arxiv | 2025.03.24 | Paper Link
-
Payload-Aware Intrusion Detection with CMAE and Large Language Models | arxiv | 2025.03.23 | Paper Link
-
RedChronos: A Large Language Model-Based Log Analysis System for Insider Threat Detection in Enterprises | arxiv | 2025.03.05 | Paper Link
-
Enhancing Cybersecurity in Critical Infrastructure with LLM-Assisted Explainable IoT Systems | arxiv | 2025.03.05 | Paper Link
-
Transforming Cyber Defense: Harnessing Agentic and Frontier AI for Proactive, Ethical Threat Intelligence | arxiv | 2025.02.28 | Paper Link
-
Cyber Defense Reinvented: Large Language Models as Threat Intelligence Copilots | arXiv | 2025.02.28 | Paper Link
-
Design and implementation of a distributed security threat detection system integrating federated learning and multimodal LLM | arXiv | 2025.02.28 | Paper Link
-
LAMD: Context-driven Android Malware Detection and Classification with LLMs | arXiv | 2025.02.18 | Paper Link
-
APT-LLM: Embedding-Based Anomaly Detection of Cyber Advanced Persistent Threats Using Large Language Models | arXiv | 2025.02.13 | Paper Link
-
AdaPhish: AI-Powered Adaptive Defense and Education Resource Against Deceptive Emails | arXiv | 2025.02.05 | Paper Link
-
SHIELD: APT Detection and Intelligent Explanation Using LLM | arXiv | 2025.02.04 | Paper Link
-
LLM-based event log analysis techniques: A survey | arXiv | 2025.02.02 | Paper Link
-
TORCHLIGHT: Shedding LIGHT on Real-World Attacks on Cloudless IoT Devices Concealed within the Tor Network | arXiv | 2025.01.28 | Paper Link
-
Confront Insider Threat: Precise Anomaly Detection in Behavior Logs Based on LLM Fine-Tuning | COLING | 2024 | Paper Link
-
Exploring Large Language Models for Semantic Analysis and Categorization of Android Malware | arXiv | 2025.01.08 | Paper Link
-
Large Multimodal Agents for Accurate Phishing Detection with Enhanced Token Optimization and Cost Reduction | arXiv | 2024.12.03 | Paper Link
-
LogLM: From Task-based to Instruction-based Automated Log Analysis | arXiv | 2024.10.12 | Paper Link
-
LogLLM: Log-based Anomaly Detection Using Large Language Models | arXiv | 2024.11.13 | Paper Link
-
Using Large Language Models for Template Detection from Security Event Logs | arXiv | 2024.09.08 | Paper Link
-
A Comparative Study on Large Language Models for Log Parsing | arXiv | 2024.09.04 | Paper Link
-
LUK: Empowering Log Understanding with Expert Knowledge from Large Language Models | arXiv | 2024.09.03 | Paper Link
-
XG-NID: Dual-Modality Network Intrusion Detection using a Heterogeneous Graph Neural Network and Large Language Model | arXiv | 2024.08.27 | Paper Link
-
LogParser-LLM: Advancing Efficient Log Parsing with Large Language Models | arXiv | 2024.08.25 | Paper Link
-
Automated Phishing Detection Using URLs and Webpages | arXiv | 2024.08.16 | Paper Link
-
Transformers and Large Language Models for Efficient Intrusion Detection Systems: A Comprehensive Survey | arXiv | 2024.08.14 | Paper Link
-
Multimodal Large Language Models for Phishing Webpage Detection and Identification | arXiv | 2024.08.12 | Paper Link
-
Utilizing Large Language Models to Optimize the Detection and Explainability of Phishing Websites | arXiv | 2024.08.11 | Paper Link
-
Towards Explainable Network Intrusion Detection using Large Language Models | arXiv | 2024.08.08 | Paper Link
-
Audit-LLM: Multi-Agent Collaboration for Log-based Insider Threat Detection | arXiv | 2024.07.12 | Paper Link
-
LogEval: A Comprehensive Benchmark Suite for Large Language Models In Log Analysis | arXiv | 2024.07.02 | Paper Link
-
Defending Against Social Engineering Attacks in the Age of LLMs | EMNLP | 2024.06.18 | Paper Link
-
Anomaly Detection on Unstable Logs with GPT Models | arXiv | 2024.06.11 | Paper Link
-
ULog: Unsupervised Log Parsing with Large Language Models through Log Contrastive Units | arXiv | 2024.06.11 | Paper Link
-
Generative AI-in-the-loop: Integrating LLMs and GPTs into the Next Generation Networks | arXiv | 2024.06.06 | Paper Link
-
Log Parsing with Self-Generated In-Context Learning and Self-Correction | arXiv | 2024.06.05 | Paper Link
-
Large Language Models in Wireless Application Design: In-Context Learning-enhanced Automatic Network Intrusion Detection | arXiv | 2024.05.17 | Paper Link
-
DoLLM: How Large Language Models Understanding Network Flow Data to Detect Carpet Bombing DDoS | arXiv | 2024.05.12 | Paper Link
-
LLMParser: An Exploratory Study on Using Large Language Models for Log Parsing | ICSE | 2024.04.27 | Paper Link
-
Large Language Models Spot Phishing Emails with Surprising Accuracy: A Comparative Analysis of Performance | arXiv | 2024.04.23 | Paper Link
-
ChatGPT for digital forensic investigation: The good, the bad, and the unknown | Forensic Science International: Digital Investigation | 2023.07.10 | Paper Link
-
HuntGPT: Integrating Machine Learning-Based Anomaly Detection and Explainable AI with Large Language Models (LLMs) | arXiv | 2023.09.27 | Paper Link
-
Revolutionizing Cyber Threat Detection with Large Language Models: A privacy-preserving BERT-based Lightweight Model for IoT/IIoT Devices | IEEE Access | 2024.02.08 | Paper Link
-
Explaining Tree Model Decisions in Natural Language for Network Intrusion Detection | arXiv | 2023.10.30 | Paper Link
-
Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models | arXiv | 2023.11.30 | Paper Link
-
Prompted Contextual Vectors for Spear-Phishing Detection | arXiv | 2024.02.14 | Paper Link
-
Evaluating the Performance of ChatGPT for Spam Email Detection | arXiv | 2024.02.23 | Paper Link
-
An Improved Transformer-based Model for Detecting Phishing, Spam, and Ham: A Large Language Model Approach | arXiv | 2023.11.12 | Paper Link
-
Application of Large Language Models to DDoS Attack Detection | International Conference on Security and Privacy in Cyber-Physical Systems and Smart Vehicles | 2024.02.05 | Paper Link
-
Web Content Filtering through knowledge distillation of Large Language Models | WI-IAT | 2023.05.10 | Paper Link
-
Lemur: Log Parsing with Entropy Sampling and Chain-of-Thought Merging | arXiv | 2024.03.02 | Paper Link
-
Interpretable Online Log Analysis Using Large Language Models with Prompt Strategies | ICPC | 2024.01.26 | Paper Link
-
LogGPT: Log Anomaly Detection via GPT | BigData | 2023.12.11 | Paper Link
-
LogGPT: Exploring ChatGPT for Log-Based Anomaly Detection | HPCC/DSS/SmartCity/DependSys | 2023.09.14 | Paper Link
-
Log-based Anomaly Detection based on EVT Theory with feedback | arXiv | 2023.09.30 | Paper Link
-
Benchmarking Large Language Models for Log Analysis, Security, and Interpretation | J. Netw. Syst. Manag. | 2023.11.24 | Paper Link
-
An Automated Attack Investigation Approach Leveraging Threat-Knowledge-Augmented Large Language Models | arxiv | 2025.09.01 | Paper Link
-
Cybersecurity AI: Hacking the AI Hackers via Prompt Injection | arxiv | 2025.09.01 | Paper Link
-
SoK: Large Language Model-Generated Textual Phishing Campaigns End-to-End Analysis of Generation, Characteristics, and Detection | arxiv | 2025.08.29 | Paper Link
-
Pentest-R1: Towards Autonomous Penetration Testing Reasoning Optimized via Two-Stage Reinforcement Learning | arxiv | 2025.08.10 | Paper Link
-
PenTest2.0: Towards Autonomous Privilege Escalation Using GenAI | arxiv | 2025.08.09 | Paper Link
-
Prompt to Pwn: Automated Exploit Generation for Smart Contracts | arxiv | 2025.08.02 | Paper Link
-
Can We End the Cat-and-Mouse Game? Simulating Self-Evolving Phishing Attacks with LLMs and Genetic Algorithms | arxiv | 2025.07.29 | Paper Link
-
Exploiting Jailbreaking Vulnerabilities in Generative AI to Bypass Ethical Safeguards for Facilitating Phishing Attacks | arxiv | 2025.07.16 | Paper Link
-
LLMalMorph: On The Feasibility of Generating Variant Malware using Large-Language-Models | arxiv | 2025.07.13 | Paper Link
-
On the Surprising Efficacy of LLMs for Penetration-Testing | arxiv | 2025.07.01 | Paper Link
-
From Promise to Peril: Rethinking Cybersecurity Red and Blue Teaming in the Age of LLMs | arxiv | 2025.06.16 | Paper Link
-
On the Ethics of Using LLMs for Offensive Security | arxiv | 2025.06.10 | Paper Link
-
ReCopilot: Reverse Engineering Copilot in Binary Analysis | arxiv | 2025.05.22 | Paper Link
-
LLMs unlock new paths to monetizing exploits | arxiv | 2025.05.16 | Paper Link
-
AutoPentest: Enhancing Vulnerability Management With Autonomous LLM Agents | arxiv | 2025.05.15 | Paper Link
-
Offensive Security for AI Systems: Concepts, Practices, and Applications | arxiv | 2025.05.09 | Paper Link
-
Weaponizing Language Models for Cybersecurity Offensive Operations: Automating Vulnerability Assessment Report Validation; A Review Paper | arxiv | 2025.05.07 | Paper Link
-
PwnGPT: Automatic Exploit Generation Based on Large Language Models | ACL | 2025.04 | Paper Link
-
On the Feasibility of Using MultiModal LLMs to Execute AR Social Engineering Attacks | arxiv | 2025.04.16 | Paper Link
-
Benchmarking Practices in LLM-driven Offensive Security: Testbeds, Metrics, and Experiment Design | arxiv | 2025.04.14 | Paper Link
-
Red Teaming with Artificial Intelligence-Driven Cyberattacks: A Scoping Review | arxiv | 2025.03.25 | Paper Link
-
A Framework for Evaluating Emerging Cyberattack Capabilities of AI | arxiv | 2025.03.15 | Paper Link
-
Jailbreaking Generative AI: Empowering Novices to Conduct Phishing Attacks | arxiv | 2025.03.03 | Paper Link
-
CAI: An Open, Bug Bounty-Ready Cybersecurity AI | arXiv | 2025.04.15 | Paper Link
-
RapidPen: Fully Automated IP-to-Shell Penetration Testing with LLM-based Agents | arXiv | 2025.02.23 | Paper Link
-
Construction and Evaluation of LLM-based agents for Semi-Autonomous penetration testing | arXiv | 2025.02.21 | Paper Link
-
OCCULT: Evaluating Large Language Models for Offensive Cyber Operation Capabilities | arXiv | 2025.02.18 | Paper Link
-
PenTest++: Elevating Ethical Hacking with AI and Automation | arXiv | 2025.02.13 | Paper Link
-
Can LLMs Hack Enterprise Networks? Autonomous Assumed Breach Penetration-Testing Active Directory Networks | arXiv | 2025.02.06 | Paper Link
-
On the Feasibility of Using LLMs to Execute Multistage Network Attacks | arXiv | 2025.01.27 | Paper Link
-
HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing | arXiv | 2024.12.02 | Paper Link
-
Hacking CTFs with Plain Agents | arXiv | 2024.12.03 | Paper Link
-
Evaluating and Improving the Robustness of Security Attack Detectors Generated by LLMs | arXiv | 2024.11.27 | Paper Link
-
AI-Augmented Ethical Hacking: A Practical Examination of Manual Exploitation and Privilege Escalation in Linux Environments | arXiv | 2024.11.26 | Paper Link
-
Next-Generation Phishing: How LLM Agents Empower Cyber Attackers | arXiv | 2024.11.22 | Paper Link
-
Adapting to Cyber Threats: A Phishing Evolution Network (PEN) Framework for Phishing Generation and Analyzing Evolution Patterns using Large Language Models | arXiv | 2024.11.18 | Paper Link
-
Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks | arXiv | 2024.11.18 | Paper Link
-
PentestAgent: Incorporating LLM Agents to Automated Penetration Testing | arXiv | 2024.11.07 | Paper Link
-
AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? | arXiv | 2024.11.02 | Paper Link
-
AutoPenBench: Benchmarking Generative Agents for Penetration Testing | arXiv | 2024.10.28 | Paper Link
-
Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements | arXiv | 2024.10.25 | Paper Link
-
On the Feasibility of Fully AI-automated Vishing Attacks | arXiv | 2024.09.20 | Paper Link
-
Hacking, The Lazy Way: LLM Augmented Pentesting | arXiv | 2024.09.14 | Paper Link
-
Is Generative AI the Next Tactical Cyber Weapon For Threat Actors? Unforeseen Implications of AI Generated Cyber Attacks | arXiv | 2024.08.23 | Paper Link
-
CIPHER: Cybersecurity Intelligent Penetration-testing Helper for Ethical Researcher | Sensors | 2024.08.21 | Paper Link
-
Using Retriever Augmented Large Language Models for Attack Graph Generation | arXiv | 2024.08.11 | Paper Link
-
Practical Attacks against Black-box Code Completion Engines | arXiv | 2024.08.05 | Paper Link
-
PenHeal: A Two-Stage LLM Framework for Automated Pentesting and Optimal Remediation | Proceedings of the Workshop on Autonomous Cybersecurity | 2024.07.25 | Paper Link
-
From Sands to Mansions: Enabling Automatic Full-Life-Cycle Cyberattack Construction with LLM | arXiv | 2024.07.24 | Paper Link
-
The Shadow of Fraud: The Emerging Danger of AI-powered Social Engineering and its Possible Cure | arXiv | 2024.07.22 | Paper Link
-
Tactics, Techniques, and Procedures (TTPs) in Interpreted Malware: A Zero-Shot Generation with Large Language Models | arXiv | 2024.07.11 | Paper Link
-
Assessing AI vs Human-Authored Spear Phishing SMS Attacks: An Empirical Study Using the TRAPD Method | arXiv | 2024.06.18 | Paper Link
-
Getting pwn’d by AI: Penetration Testing with Large Language Models | ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering | 2023.08.17 | Paper Link
-
RatGPT: Turning online LLMs into Proxies for Malware Attacks | arXiv | 2023.09.07 | Paper Link
-
AutoAttacker: A Large Language Model Guided System to Implement Automatic Cyber-attacks | arXiv | 2024.03.02 | Paper Link
-
PentestGPT: An LLM-empowered Automatic Penetration Testing Tool | USENIX | 2023.08.13 | Paper Link
-
From Text to MITRE Techniques: Exploring the Malicious Use of Large Language Models for Generating Cyber Attack Payloads | arXiv | 2023.05.24 | Paper Link
-
From Chatbots to PhishBots? - Preventing Phishing scams created using ChatGPT, Google Bard and Claude | arXiv | 2024.03.10 | Paper Link
-
Exploring the Dark Side of AI: Advanced Phishing Attack Design and Deployment Using ChatGPT | CNS | 2023.09.19 | Paper Link
-
Using Large Language Models for Cybersecurity Capture-The-Flag Challenges and Certification Questions | arXiv | 2023.08.21 | Paper Link
-
Evaluating LLMs for Privilege-Escalation Scenarios | arXiv | 2023.10.23 | Paper Link
-
Malla: Demystifying Real-world Large Language Model Integrated Malicious Services | USENIX | 2024.01.06 | Paper Link
-
LLMs Killed the Script Kiddie: How Agents Supported by Large Language Models Change the Landscape of Network Threat Testing | arXiv | 2023.10.10 | Paper Link
-
From ChatGPT to ThreatGPT: Impact of Generative AI in Cybersecurity and Privacy | IEEE Access | 2023.07.03 | Paper Link
-
Impact of Big Data Analytics and ChatGPT on Cybersecurity | I3CS | 2023.05.22 | Paper Link
-
Identifying and mitigating the security risks of generative ai | Foundations and Trends in Privacy and Security | 2023.12.29 | Paper Link
-
From Legacy to Standard: LLM-Assisted Transformation of Cybersecurity Playbooks into CACAO Format | arxiv | 2025.08.05 | Paper Link
-
Information Security Based on LLM Approaches: A Review | arxiv | 2025.07.24 | Paper Link
-
Large Language Models in Cybersecurity: Applications, Vulnerabilities, and Defense Techniques | arxiv | 2025.07.18 | Paper Link
-
Cybersecurity AI: The Dangerous Gap Between Automation and Autonomy | arxiv | 2025.06.30 | Paper Link
-
Using LLMs for Security Advisory Investigations: How Far Are We? | arxiv | 2025.06.16 | Paper Link
-
Exposing the Impact of GenAI for Cybercrime: An Investigation into the Dark Side | arxiv | 2025.05.29 | Paper Link
-
Large Language Models for IT Automation Tasks: Are We There Yet? | arxiv | 2025.05.26 | Paper Link
-
Mitigating Cyber Risk in the Age of Open-Weight LLMs: Policy Gaps and Technical Realities | arxiv | 2025.05.21 | Paper Link
-
ACSE-Eval: Can LLMs threat model real-world cloud infrastructure? | arxiv | 2025.05.16 | Paper Link
-
LLMs Suitability for Network Security: A Case Study of STRIDE Threat Modeling | arxiv | 2025.05.06 | Paper Link
-
From Texts to Shields: Convergence of Large Language Models and Cybersecurity | arxiv | 2025.05.01 | Paper Link
-
Automatically Generating Rules of Malicious Software Packages via Large Language Model | arxiv | 2025.04.24 | Paper Link
-
Exploring the Role of Large Language Models in Cybersecurity: A Systematic Survey | arxiv | 2025.04.22 | Paper Link
-
SoK: Frontier AIs Impact on the Cybersecurity Landscape | arxiv | 2025.04.07 | Paper Link
-
Emerging Cyber Attack Risks of Medical AI Agents | arxiv | 2025.04.02 | Paper Link
-
Inducing Personality in LLM-Based Honeypot Agents: Measuring the Effect on Human-Like Agenda Generation | arxiv | 2025.03.25 | Paper Link
-
ChatIoT: Large Language Model-based Security Assistant for Internet of Things with Retrieval-Augmented Generation | arXiv | 2025.02.14 | Paper Link
-
Empowering AIOps: Leveraging Large Language Models for IT Operations Management | arXiv | 2025.01.21 | Paper Link
-
BARTPredict: Empowering IoT Security with LLM-Driven Cyber Threat Prediction | arXiv | 2025.01.03 | Paper Link
-
Toward Intelligent and Secure Cloud: Large Language Model Empowered Proactive Defense | arXiv | 2024.12.30 | Paper Link
-
Emerging Security Challenges of Large Language Models | arXiv | 2024.12.23 | Paper Link
-
Ontology-Aware RAG for Improved Question-Answering in Cybersecurity Education | arXiv | 2024.12.10 | Paper Link
-
Integrating Large Language Models with Internet of Things Applications | arXiv | 2024.10.25 | Paper Link
-
CmdCaliper: A Semantic-Aware Command-Line Embedding Model and Dataset for Security Research | EMNLP | 2024.10.02 | Paper Link
-
Advancing Cyber Incident Timeline Analysis Through Rule Based AI and Large Language Models | arXiv | 2024.09.25 | Paper Link
-
Contextualized AI for Cyber Defense: An Automated Survey using LLMs | arXiv | 2024.09.20 | Paper Link
-
LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems | arXiv | 2024.09.15 | Paper Link
-
ScriptSmith: A Unified LLM Framework for Enhancing IT Operations via Automated Bash Script Generation, Assessment, and Refinement | arXiv | 2024.09.12 | Paper Link
-
Beyond Detection: Leveraging Large Language Models for Cyber Attack Prediction in IoT Networks | arXiv | 2024.08.26 | Paper Link
-
MistralBSM: Leveraging Mistral-7B for Vehicular Networks Misbehavior Detection | arXiv | 2024.07.26 | Paper Link
-
MoRSE: Bridging the Gap in Cybersecurity Expertise with Retrieval Augmented Generation | arXiv | 2024.07.22 | Paper Link
-
Disassembling Obfuscated Executables with LLM | arXiv | 2024.07.12 | Paper Link
-
On Large Language Models in National Security Applications | arXiv | 2024.07.03 | Paper Link
-
Threat Modelling and Risk Analysis for Large Language Model (LLM)-Powered Applications | arXiv | 2024.06.16 | Paper Link
-
Exploring the Efficacy of Large Language Models (GPT-4) in Binary Reverse Engineering | arXiv | 2024.06.09 | Paper Link
-
A Comprehensive Overview of Large Language Models (LLMs) for Cyber Defences: Opportunities and Directions | arXiv | 2024.05.23 | Paper Link
-
LLMPot: Automated LLM-based Industrial Protocol and Physical Process Emulation for ICS Honeypots | arXiv | 2024.05.10 | Paper Link
-
Critical Infrastructure Protection: Generative AI, Challenges, and Opportunities | arXiv | 2024.05.08 | Paper Link
-
Large Language Models for Cyber Security: A Systematic Literature Review | arXiv | 2024.05.08 | Paper Link
-
AppPoet: Large Language Model based Android malware detection via multi-view prompt engineering | arXiv | 2024.04.29 | Paper Link
-
Act as a Honeytoken Generator! An Investigation into Honeytoken Generation with Large Language Models | arXiv | 2024.04.24 | Paper Link
-
How Far Have We Gone in Stripped Binary Code Understanding Using Large Language Models | arXiv | 2024.04.16 | Paper Link
-
Is Stack Overflow Obsolete? An Empirical Study of the Characteristics of ChatGPT Answers to Stack Overflow Questions | CHI | 2024.02.07 | Paper Link
-
Prompting Is All You Need: Automated Android Bug Replay with Large Language Models | ICSE | 2023.07.18 | Paper Link
-
Enhancing Network Management Using Code Generated by Large Language Models | Proceedings of the 22nd ACM Workshop on Hot Topics in Networks | 2023.08.11 | [Paper Link] (https://arxiv.org/abs/2308.06261)
-
Employing LLMs for Incident Response Planning and Review | arXiv | 2024.03.02 | Paper Link
-
LLM in the Shell: Generative Honeypots | EuroS&P Workshop | 2024.02.09 | Paper Link
-
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations | arXiv | 2023.12.07 | Paper Link
-
Harnessing the Power of LLM to Support Binary Taint Analysis | arXiv | 2023.10.12 | Paper Link
-
LLM for SoC Security: A Paradigm Shift | IEEE Access | 2023.10.09 | Paper Link
-
Just-in-Time Security Patch Detection -- LLM At the Rescue for Data Augmentation | arXiv | 2023.12.12 | Paper Link
-
Anatomy of an AI-powered malicious social botnet | arXiv | 2023.07.30 | Paper Link
-
An LLM-based Framework for Fingerprinting Internet-connected Devices | ACM on Internet Measurement Conference | 2023.10.24 | Paper Link
-
Training Language Model Agents to Find Vulnerabilities with CTF-Dojo | arxiv | 2025.08.25 | Paper Link
-
FaultLine: Automated Proof-of-Vulnerability Generation Using LLM Agents | arxiv | 2025.07.21 | Paper Link
-
From CVE Entries to Verifiable Exploits: An Automated Multi-Agent Framework for Reproducing CVEs | arxiv | 2025.09.02 | Paper Link
-
Multi-Agent Penetration Testing AI for the Web | arxiv | 2025.08.28 | Paper Link
-
CyberSleuth: Autonomous Blue-Team LLM Agent for Web Attack Forensics | arxiv | 2025.08.28 | Paper Link
-
BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems | arxiv | 2025.07.10 | Paper Link
-
AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models | arxiv | 2025.06.17 | Paper Link
-
Measuring and Augmenting Large Language Models for Solving Capture-the-Flag Challenges | arxiv | 2025.06.21 | Paper Link
-
Towards Effective Offensive Security LLM Agents: Hyperparameter Tuning, LLM as a Judge, and a Lightweight CTF Benchmark | arxiv | 2025.08.05 | Paper Link
-
Autonomous Penetration Testing: Solving Capture-the-Flag Challenges with LLMs | arxiv | 2025.08.01 | Paper Link
-
AURA: A Multi-Agent Intelligence Framework for Knowledge-Enhanced Cyber Threat Attribution | arxiv | 2025.06.11 | Paper Link
-
Improving LLM Agents with Reinforcement Learning on Cryptographic CTF Challenges | arxiv | 2025.06.01 | Paper Link
-
RefPentester: A Knowledge-Informed Self-Reflective Penetration Testing Framework Based on Large Language Models | arxiv | 2025.05.1agent\t1 | Paper Link
-
RedTeamLLM: an Agentic AI framework for offensive security | arxiv | 2025.05.11 | Paper Link
-
AutoPatch: Multi-Agent Framework for Patching Real-World CVE Vulnerabilities | arxiv | 2025.05.07 | Paper Link
-
Agent That Debugs: Dynamic State-Guided Vulnerability Repair | arxiv | 2025.04.10 | Paper Link
-
CAI: An Open, Bug Bounty-Ready Cybersecurity AI | arxiv | 2025.04.08 | Paper Link
-
Agentic AI and the Cyber Arms Race | arxiv | 2025.03.10 | Paper Link
-
VulnBot: Autonomous Penetration Testing for A Multi-Agent Collaborative Framework | arXiv | 2025.01.23 | Paper Link
-
Multi-Agent Collaboration in Incident Response with Large Language Models | arXiv | 2024.12.03 | Paper Link
-
LLM Agent Honeypot: Monitoring AI Hacking Agents in the Wild | arXiv | 2024.10.17 | Paper Link
-
MarsCode Agent: AI-native Automated Bug Fixing | arXiv | 2024.09.04 | Paper Link
-
BreachSeek: A Multi-Agent Automated Penetration Tester | arXiv | 2024.08.31 | Paper Link
-
PhishAgent: A Robust Multimodal Agent for Phishing Webpage Detection | arXiv | 2024.08.20 | Paper Link
-
Using LLMs to Automate Threat Intelligence Analysis Workflows in Security Operation Centers | arXiv | 2024.07.18 | Paper Link
-
Teams of LLM Agents can Exploit Zero-Day Vulnerabilities | arXiv | 2024.06.02 | Paper Link
-
Generative AI and Large Language Models for Cyber Security: All Insights You Need | arXiv | 2024.05.21 | Paper Link
-
Generative AI in Cybersecurity | arXiv | 2024.05.02 | Paper Link
-
Large Language Models for Networking: Workflow, Advances and Challenges | arXiv | 2024.04.29 | Paper Link
-
LLM Agents can Autonomously Exploit One-day Vulnerabilities | arXiv | 2024.04.17 | Paper Link
-
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents | ACL Findings | 2024.03.25 | Paper Link
-
WIPI: A New Web Threat for LLM-Driven Web Agents | arXiv | 2024.02.26 | Paper Link
-
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents | EMNLP Findings | 2024.02.18 | Paper Link
-
Large Language Models for Networking: Applications, Enabling Techniques, and Challenges | arXiv | 2023.11.29 | Paper Link
-
TaskWeaver: A Code-First Agent Framework | arXiv | 2023.12.01 | Paper Link
-
If llm is the wizard, then code is the wand: A survey on how code empowers large language models to serve as intelligent agents. | arXiv | 2024.01.08 | Paper Link
-
From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs | arXiv | 2024.02.28 | Paper Link
-
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs | ICLR | 2023.10.03 | Paper Link
-
The Rise and Potential of Large Language Model Based Agents: A Survey | arXiv | 2023.09.19 | Paper Link
-
TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage | arXiv | 2023.11.07 | Paper Link
-
Nissist: An Incident Mitigation Copilot based on Troubleshooting Guides | ECAI | 2024.02.27 | Paper Link
-
Llm agents can autonomously hack websites. | arXiv | 2024.02.16 | Paper Link
-
Out of the Cage: How Stochastic Parrots Win in Cyber Security Environments | ICAART | 2023.08.28 | Paper Link
-
LLMind: Orchestrating AI and IoT with LLM for Complex Task Execution | arXiv | 2024.02.20 | Paper Link
-
A unified cybersecurity framework for complex environments | Proceedings of the Annual Conference of the South African Institute of Computer Scientists and Information Technologists | 2018.09.26 | Paper Link
-
Cybersecurity Issues and Challenges | Handbook of research on cybersecurity issues and challenges for business and FinTech applications | 2022.08 | Paper Link
@article{zhang2025llms,
title={When llms meet cybersecurity: A systematic literature review},
author={Zhang, Jie and Bu, Haoyu and Wen, Hui and Liu, Yongji and Fei, Haiqiang and Xi, Rongrong and Li, Lun and Yang, Yun and Zhu, Hongsong and Meng, Dan},
journal={Cybersecurity},
volume={8},
number={1},
pages={1--41},
year={2025},
publisher={SpringerOpen}
}

