GitHub - tmylla/Awesome-LLM4Cybersecurity: An overview of LLMs for cybersecurity.

When LLMs Meet Cybersecurity: A Systematic Literature Review

🔥 Updates

📆[2025-11-17] We are pleased to announce our latest work, PACEbench: A Framework for Evaluating Practical AI Cyber-Exploitation Capabilities, a comprehensive benchmark for evaluating LLM-based agent capabilities in real-world cybersecurity.

📆[2025-11-17] We have updated the related papers up to 2025/08/31, with 176 new papers added (2025.03.01-2025.08.31).

📆[2025-03-03] We have updated the related papers up to 2025/02/28, with 33 new papers added (2025.01.01-2025.02.28).

📆[2025-01-21] We have updated the related papers up to 2024/12/31, with 74 new papers added (2024.09.01-2024.12.31).

📆[2025-01-08] We have included the publication venues for each paper.

📆[2024-09-21] We have updated the related papers up to 2024/08/31, with 75 new papers added (2024.06.01-2024.08.31).

When LLMs Meet Cybersecurity: A Systematic Literature Review
🔥 Updates
🌈 Introduction
🚩 Features
📜 Literatures
📖 BibTeX
⭐ Star History

🌈 Introduction

We are excited to present "When LLMs Meet Cybersecurity: A Systematic Literature Review," a comprehensive overview of LLM applications in cybersecurity.

We seek to address three key questions:

RQ1: How to construct cyber security-oriented domain LLMs?
RQ2: What are the potential applications of LLMs in cybersecurity?
RQ3: What are the existing challenges and further research directions about the application of LLMs in cybersecurity?

🚩 Features

(2024.08.20) Our study encompasses an analysis of over 300 works, spanning across 25+ LLMs and more than 10 downstream scenarios.

🌟 Literatures

RQ1: How to construct cybersecurity-oriented domain LLMs?

Cybersecurity Evaluation Benchmarks

AICrypto: A Comprehensive Benchmark For Evaluating Cryptography Capabilities of Large Language Models | arxiv | 2025.07.13 | Paper Link
ExCyTIn-Bench: Evaluating LLM agents on Cyber Threat Investigation | arxiv | 2025.07.14 | Paper Link
DefenderBench: A Toolkit for Evaluating Language Agents in Cybersecurity Environments | arxiv | 2025.06.10 | Paper Link
CyberGym: Evaluating AI Agents Cybersecurity Capabilities with Real-World Vulnerabilities at Scale | arxiv | 2025.06.03 | Paper Link
DFIR-Metric: A Benchmark Dataset for Evaluating Large Language Models in Digital Forensics and Incident Response | arxiv | 2025.05.26 | Paper Link
VADER: A Human-Evaluated Benchmark for Vulnerability Assessment, Detection, Explanation, and Remediation | arxiv | 2025.05.26 | Paper Link
BinMetric: A Comprehensive Binary Analysis Benchmark for Large Language Models | arxiv | 2025.05.12 | Paper Link
The Digital Cybersecurity Expert: How Far Have We Come? | arxiv | 2025.04.16 | Paper Link
On Benchmarking Code LLMs for Android Malware Analysis | arxiv | 2025.04.01 | Paper Link
CVE-Bench: A Benchmark for AI Agents Ability to Exploit Real-World Web Application Vulnerabilities | arxiv | 2025.03.21 | Paper Link
Benchmarking LLMs and LLM-based Agents in Practical Vulnerability Detection for Code Repositories | arxiv | 2025.03.05 | Paper Link
AttackSeqBench: Benchmarking Large Language Models Understanding of Sequential Patterns in Cyber Attacks | arxiv | 2025.03.05 | Paper Link
CyberLLMInstruct: A New Dataset for Analysing Safety of Fine-Tuned LLMs Using Cyber Security Data | arxiv | 2025.03.12 | Paper Link
Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training | arXiv | 2025.02.16 | Paper Link
ITBench: Evaluating AI Agents across Diverse Real-World IT Automation Tasks | arXiv | 2025.02.07 | Paper Link
SecBench: A Comprehensive Multi-Dimensional Benchmarking Dataset for LLMs in Cybersecurity | arXiv | 2024.12.31 | Paper Link
AI Cyber Risk Benchmark: Automated Exploitation Capabilities | arXiv | 2024.12.09 | Paper Link
CS-Eval: A Comprehensive Large Language Model Benchmark for CyberSecurity | arXiv | 2024.11.25 | Paper Link
AttackER: Towards Enhancing Cyber-Attack Attribution with a Named Entity Recognition Dataset | arXiv | 2024.08.09 | Paper Link
CYBERSECEVAL 3: Advancing the Evaluation of Cybersecurity Risks and Capabilities in Large Language Models | arXiv | 2024.08.03 | Paper Link
eyeballvul: a future-proof benchmark for vulnerability detection in the wild | arXiv | 2024.07.11 | Paper Link
NYU CTF Dataset: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security | arXiv | 2024.06.09 | Paper Link
SECURE: Benchmarking Generative Large Language Models for Cybersecurity Advisory | arXiv | 2024.05.30 | Paper Link
Assessing Cybersecurity Vulnerabilities in Code Large Language Models | arXiv | 2024.04.29 | Paper Link
Can LLMs Understand Computer Networks? Towards a Virtual System Administrator | arXiv | 2024.04.22 | Paper Link
LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations | IEEE/ACM International Conference on Mining Software Repositories | 2023.03.16 | Paper Link
OpsEval: A Comprehensive IT Operations Benchmark Suite for Large Language Models | arXiv | 2024.02.16 | Paper Link
Can llms patch security issues? | arXiv | 2024.02.19 | Paper Link
CyberMetric: A Benchmark Dataset for Evaluating Large Language Models Knowledge in Cybersecurity | arXiv | 2024.02.12 | Paper Link
DebugBench: Evaluating Debugging Capability of Large Language Models | ACL Findings | 2024.01.11 | Paper Link
Securityeval dataset: mining vulnerability examples to evaluate machine learning-based code generation techniques. | Proceedings of the 1st International Workshop on Mining Software Repositories Applications for Privacy and Security | 2022.11.09 | Paper Link
SecQA: A Concise Question-Answering Dataset for Evaluating Large Language Models in Computer Security | arXiv | 2023.12.26 | Paper Link
Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models | arXiv | 2023.12.07 | Paper Link
An empirical study of netops capability of pre-trained large language models. | arXiv | 2023.09.19 | Paper Link
SecEval: A Comprehensive Benchmark for Evaluating Cybersecurity Knowledge of Foundation Models | Github | 2023 | Paper Link

Fine-tuned Domain LLMs for Cybersecurity

Llama-3.1-FoundationAI-SecurityLLM-8B-Instruct Technical Report | arxiv | 2025.08.01 | Paper Link
Cyber-Zero: Training Cybersecurity Agents without Runtime | arxiv | 2025.07.29 | Paper Link
PhishIntentionLLM: Uncovering Phishing Website Intentions through Multi-Agent Retrieval-Augmented Generation | arxiv | 2025.07.21 | Paper Link
Less Data, More Security: Advancing Cybersecurity LLMs Specialization via Resource-Efficient Domain-Adaptive Continuous Pre-training with Minimal Tokens | arxiv | 2025.06.30 | Paper Link
Large Language Model-driven Security Assistant for Internet of Things via Chain-of-Thought | arxiv | 2025.05.08 | Paper Link
Llama-3.1-FoundationAI-SecurityLLM-Base-8B Technical Report | arxiv | 2025.04.28 | Paper Link
TrafficLLM: Enhancing Large Language Models for Network Traffic Analysis with Generic Traffic Representation | arxiv | 2025.04.05 | Paper Link
CyberBOT: Towards Reliable Cybersecurity Education via Ontology-Grounded Retrieval Augmented Generation | arxiv | 2025.04.01 | Paper Link
Phishsense-1B: A Technical Perspective on an AI-Powered Phishing Detection Model | arxiv | 2025.03.14 | Paper Link
ELTEX: A Framework for Domain-Driven Synthetic Data Generation | arXiv | 2025.03.19 | Paper Link
Fine-tuning Large Language Models for DGA and DNS Exfiltration Detection | arXiv | 2024.11.07 | Paper Link
AttackQA: Development and Adoption of a Dataset for Assisting Cybersecurity Operations using Fine-tuned and Open-Source LLMs | arXiv | 2024.11.02 | Paper Link
Hackphyr: A Local Fine-Tuned LLM Agent for Network Security Environments | arXiv | 2024.09.17 | Paper Link
CyberPal.AI: Empowering LLMs with Expert-Driven Cybersecurity Instructions | arXiv | 2024.08.18 | Paper Link
IoT-LM: Large Multisensory Language Models for the Internet of Things | arXiv | 2024.07.13 | Paper Link
A Comprehensive Evaluation of Parameter-Efficient Fine-Tuning on Automated Program Repair | arXiv | 2024.06.09 | Paper Link
Security Vulnerability Detection with Multitask Self-Instructed Fine-Tuning of Large Language Models | arXiv | 2024.06.09 | Paper Link
Transforming Computer Security and Public Trust Through the Exploration of Fine-Tuning Large Language Models | arXiv | 2024.06.02 | Paper Link
Assessing LLMs in Malicious Code Deobfuscation of Real-world Malware Campaigns | arXiv | 2024.04.30 | Paper Link
Nova+: Generative Language Models for Binaries | arXiv | 2023.11.27 | Paper Link
Instruction Tuning for Secure Code Generation | ICML | 2024.02.14 | Paper Link
Efficient Avoidance of Vulnerabilities in Auto-completed Smart Contract Code Using Vulnerability-constrained Decoding | ISSRE | 2023.10.06 | Paper Link
RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair | arXiv | 2024.03.11 | Paper Link
Finetuning Large Language Models for Vulnerability Detection | arXiv | 2024.02.29 | Paper Link
Large Language Models for Test-Free Fault Localization | ICSE | 2023.10.03 | Paper Link
HackMentor: Fine-tuning Large Language Models for Cybersecurity | TrustCom | 2023.09 | Paper Link
Owl: A Large Language Model for IT Operations | ICLR | 2023.09.17 | Paper Link
SecureFalcon: The Next Cyber Reasoning System for Cyber Security | arXiv | 2023.07.13 | Paper Link

RQ2: What are the potential applications of LLMs in cybersecurity?

Threat Intelligence

DroidTTP: Mapping Android Applications with TTP for Cyber Threat Intelligence | arxiv | 2025.03.20 | Paper Link
A Systematic Approach to Predict the Impact of Cybersecurity Vulnerabilities Using LLMs | arxiv | 2025.08.25 | Paper Link
Enabling Transparent Cyber Threat Intelligence Combining Large Language Models and Domain Ontologies | arxiv | 2025.08.27 | Paper Link
False Alarms, Real Damage: Adversarial Attacks Using LLM-based Models on Text-based Cyber Threat Intelligence Systems | arxiv | 2025.07.05 | Paper Link
LRCTI: A Large Language Model-Based Framework for Multi-Step Evidence Retrieval and Reasoning in Cyber Threat Intelligence Credibility Verification | arxiv | 2025.07.15 | Paper Link
Towards Effective Identification of Attack Techniques in Cyber Threat Intelligence Reports using Large Language Models | arxiv | 2025.05.05 | Paper Link
Can We Enhance Bug Report Quality Using LLMs?: An Empirical Study of LLM-Based Bug Report Generation | arxiv | 2025.04.26 | Paper Link
MaLAware: Automating the Comprehension of Malicious Software Behaviours using Large Language Models (LLMs) | arxiv | 2025.04.01 | Paper Link
LLM-Assisted Proactive Threat Intelligence for Automated Reasoning | arxiv | 2025.04.01 | Paper Link
Large Language Models are Unreliable for Cyber Threat Intelligence | arxiv | 2025.03.29 | Paper Link
Cyber Defense Reinvented: Large Language Models as Threat Intelligence Copilots | arXiv | 2025.02.28 | Paper Link
Labeling NIDS Rules with MITRE ATT&CK Techniques: Machine Learning vs. Large Language Models | arXiv | 2024.12.16 | Paper Link
IntellBot: Retrieval Augmented LLM Chatbot for Cyber Threat Knowledge Delivery | arXiv | 2024.11.08| Paper Link
CTINEXUS: Leveraging Optimized LLM In-Context Learning for Constructing Cybersecurity Knowledge Graphs Under Data Scarcity | arXiv | 2024.10.28 | Paper Link
AI-Driven Cyber Threat Intelligence Automation | arXiv | 2024.10.27 | Paper Link
Cyber Knowledge Completion Using Large Language Models | arXiv | 2024.09.24 | Paper Link
Evaluating the Usability of LLMs in Threat Intelligence Enrichment | arXiv | 2024.09.23 | Paper Link
KGV: Integrating Large Language Models with Knowledge Graphs for Cyber Threat Intelligence Credibility Assessment | arXiv | 2024.08.15 | Paper Link
Usefulness of data flow diagrams and large language models for security threat validation: a registered report | arXiv | 2024.08.14 | Paper Link
A RAG-Based Question-Answering Solution for Cyber-Attack Investigation and Attribution | arXiv | 2024.08.12 | Paper Link
The Use of Large Language Models (LLM) for Cyber Threat Intelligence (CTI) in Cybercrime Forums | arXiv | 2024.08.08 | Paper Link
Psychological Profiling in Cybersecurity: A Look at LLMs and Psycholinguistic Features | arXiv | 2024.08.09 | Paper Link
Using LLMs to Automate Threat Intelligence Analysis Workflows in Security Operation Centers | arXiv | 2024.07.18 | Paper Link
LLMCloudHunter: Harnessing LLMs for Automated Extraction of Detection Rules from Cloud-Based CTI | arXiv | 2024.07.06 | Paper Link
Actionable Cyber Threat Intelligence using Knowledge Graphs and Large Language Models | arXiv | 2024.06.30 | Paper Link
AttacKG+:Boosting Attack Knowledge Graph Construction with Large Language Models | EuroS&P Workshop | 2024.05.08 | Paper Link
SEvenLLM: Benchmarking, Eliciting, and Enhancing Abilities of Large Language Models in Cyber Threat Intelligence | arXiv | 2024.05.06 | Paper Link
Crimson: Empowering Strategic Reasoning in Cybersecurity through Large Language Models | arXiv | 2024.03.01 | Paper Link
Evaluation of LLM Chatbots for OSINT-based Cyber Threat Awareness | Expert Syst. Appl. | 2024.03.13 | Paper Link
LOCALINTEL: Generating Organizational Threat Intelligence from Global and Local Cyber Knowledge | arXiv | 2024.01.18 | Paper Link
Advancing TTP Analysis: Harnessing the Power of Encoder-Only and Decoder-Only Language Models with Retrieval Augmented Generation | arXiv | 2024.01.12 | Paper Link
ChatGPT, Llama, can you write my report? An experiment on assisted digital forensics reports written using (Local) Large Language Models | Forensic Sci. Int. Digit. Investig. | 2023.12.22 | Paper Link
HW-V2W-Map: Hardware Vulnerability to Weakness Mapping Framework for Root Cause Analysis with GPT-assisted Mitigation Suggestion | arXiv | 2023.12.21 | Paper Link
AGIR: Automating Cyber Threat Intelligence Reporting with Natural Language Generation | BigData | 2023.10.04 | Paper Link
Cyber Sentinel: Exploring Conversational Agents in Streamlining Security Tasks with GPT-4 | arXiv | 2023.09.28 | Paper Link
Cupid: Leveraging ChatGPT for More Accurate Duplicate Bug Report Detection | arXiv | 2023.08.27 | Paper Link
On the Uses of Large Language Models to Interpret Ambiguous Cyberattack Descriptions | arXiv | 2023.08.22 | Paper Link
An Empirical Study on Using Large Language Models to Analyze Software Supply Chain Security Failures | Proceedings of the 2023 Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses | 2023.08.09 | Paper Link
Time for aCTIon: Automated Analysis of Cyber Threat Intelligence in the Wild | arXiv | 2023.07.14 | Paper Link

FUZZ

LLM-Assisted Model-Based Fuzzing of Protocol Implementations | arxiv | 2025.08.03 | Paper Link
Fuzzing: Randomness? Reasoning! Efficient Directed Fuzzing via Large Language Models | arxiv | 2025.06.30 | Paper Link
Directed Greybox Fuzzing via Large Language Model | arxiv | 2025.05.06 | Paper Link
ToolFuzz -- Automated Agent Tool Testing | arxiv | 2025.03.06 | Paper Link
Towards Reliable LLM-Driven Fuzz Testing: Vision and Road Ahead | arxiv | 2025.03.02 | Paper Link
Your Fix Is My Exploit: Enabling Comprehensive DL Library API Fuzzing with Large Language Models | arXiv | 2025.01.08 | Paper Link
Large Language Model assisted Hybrid Fuzzing | arXiv | 2024.12.19 | Paper Link
Harnessing Large Language Models for Seed Generation in Greybox Fuzzing | arXiv | 2024.11.27 | Paper Link
ChatHTTPFuzz: Large Language Model-Assisted IoT HTTP Fuzzing | arXiv | 2024.11.18 | Paper Link
AutoSafeCoder: A Multi-Agent Framework for Securing LLM Code Generation through Static Analysis and Fuzz Testing | arXiv | 2024.11.05 | Paper Link
FuzzCoder: Byte-level Fuzzing Test via Large Language Model | arXiv | 2024.09.03 | Paper Link
An Exploratory Study on Using Large Language Models for Mutation Testing | arXiv | 2024.06.14 | Paper Link
Prompt Fuzzing for Fuzz Driver Generation | ACM CCS 2024 | 2024.05.29 | Paper Link
When Fuzzing Meets LLMs: Challenges and Opportunities | ACM International Conference on the Foundations of Software Engineering | 2024.04.25 | Paper Link
Fuzzing BusyBox: Leveraging LLM and Crash Reuse for Embedded Bug Unearthing | USENIX | 2024.03.06 | Paper Link
Large language model guided protocol fuzzing | NDSS | 2024.02.26 | Paper Link
Fuzz4All: Universal Fuzzing with Large Language Models | ICSE | 2024.01.15 | Paper Link
How well does LLM generate security tests? | arXiv | 2023.10.03 | Paper Link
CODAMOSA: Escaping Coverage Plateaus in Test Generation with Pre-trained Large Language Models | ICSE | 2023.07.26 | Paper Link
Understanding Large Language Model Based Fuzz Driver Generation | arXiv | 2023.07.24 | Paper Link
Large Language Models Are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models | ISSTA | 2023.06.07 | Paper Link
Augmenting Greybox Fuzzing with Generative AI | arXiv | 2023.06.11 | Paper Link
Large Language Models are Edge-Case Fuzzers: Testing Deep Learning Libraries via FuzzGPT | arXiv | 2023.04.04 | Paper Link

Vulnerabilities Detection

SEC-bench: Automated Benchmarking of LLM Agents on Real-World Software Security Tasks | arxiv | 2025.06.13 | Paper Link
Large Language Models Versus Static Code Analysis Tools: A Systematic Benchmark for Vulnerability Detection | arxiv | 2025.08.06 | Paper Link
A Systematic Literature Review on Detecting Software Vulnerabilities with Large Language Models | arxiv | 2025.07.30 | Paper Link
Out of Distribution, Out of Luck: How Well Can LLMs Trained on Vulnerability Datasets Detect Top 25 CWE Weaknesses? | arxiv | 2025.07.29 | Paper Link
LLMxCPG: Context-Aware Vulnerability Detection Through Code Property Graph-Guided Large Language Models | USENIX | 2025.07.22 | Paper Link
Revisiting Pre-trained Language Models for Vulnerability Detection | arxiv | 2025.07.22 | Paper Link
MalCodeAI: Autonomous Vulnerability Detection and Remediation via Language Agnostic Code Reasoning | arxiv | 2025.07.15 | Paper Link
Identifying Helpful Context for LLM-based Vulnerability Repair: A Preliminary Study | arxiv | 2025.06.13 | Paper Link
VulStamp: Vulnerability Assessment using Large Language Model | arxiv | 2025.06.13 | Paper Link
Large Language Models for Multilingual Vulnerability Detection: How Far Are We? | arxiv | 2025.06.09 | Paper Link
Boosting Vulnerability Detection of LLMs via Curriculum Preference Optimization with Synthetic Reasoning Data | arxiv | 2025.06.09 | Paper Link
Let the Trial Begin: A Mock-Court Approach to Vulnerability Detection using LLM-Based Agents | arxiv | 2025.05.16 | Paper Link
A Preliminary Study of Large Language Models for Multilingual Vulnerability Detection | arxiv | 2025.05.12 | Paper Link
Enhancing Large Language Models with Faster Code Preprocessing for Vulnerability Detection | arxiv | 2025.05.08 | Paper Link
LASHED: LLMs And Static Hardware Analysis for Early Detection of RTL Bugs | arxiv | 2025.04.30 | Paper Link
LLMpatronous: Harnessing the Power of LLMs For Vulnerability Detection | arxiv | 2025.04.25 | Paper Link
Context-Enhanced Vulnerability Detection Based on Large Language Model | arxiv | 2025.04.23 | Paper Link
Automated Static Vulnerability Detection via a Holistic Neuro-symbolic Approach | arxiv | 2025.04.22 | Paper Link
Everything You Wanted to Know About LLM-based Vulnerability Detection But Were Afraid to Ask | arxiv | 2025.04.18 | Paper Link
MOS: Towards Effective Smart Contract Vulnerability Detection through Mixture-of-Experts Tuning of Large Language Models | arxiv | 2025.04.16 | Paper Link
Malware analysis assisted by AI with R2AI | arxiv | 2025.04.10 | Paper Link
Large Language Model (LLM) for Software Security: Code Analysis, Malware Analysis, Reverse Engineering | arxiv | 2025.04.08 | Paper Link
CVE-Bench: Benchmarking LLM-based Software Engineering Agent`s Ability to Repair Real-World CVE Vulnerabilities | NAACL | 2025.03 | Paper Link
Reasoning with LLMs for Zero-Shot Vulnerability Detection | arxiv | 2025.03.22 | Paper Link
Vulnerability Detection: From Formal Verification to Large Language Models and Hybrid Approaches: A Comprehensive Overview | arxiv | 2025.03.13 | Paper Link
CASTLE: Benchmarking Dataset for Static Code Analyzers and LLMs towards CWE Detection | arxiv | 2025.03.12 | Paper Link
Benchmarking Large Language Models for Multi-Language Software Vulnerability Detection | arxiv | 2025.03.03 | Paper Link
CVE-LLM : Ontology-Assisted Automatic Vulnerability Evaluation Using Large Language Models | arXiv | 2025.02.21 | Paper Link
Large Language Models in Software Security: A Survey of Vulnerability Detection Techniques and Insights | arXiv | 2025.02.10 | Paper Link
Large Language Models for In-File Vulnerability Localization Can Be "Lost in the End" | arXiv | 2025.02.09 | Paper Link
Streamlining Security Vulnerability Triage with Large Language Models | arXiv | 2025.01.31 | Paper Link
Evaluating Large Language Models in Vulnerability Detection Under Variable Context Windows | arXiv | 2025.01.30 | Paper Link
Helping LLMs Improve Code Generation Using Feedback from Testing and Static Analysis | arXiv | 2025.01.07 | Paper Link
CGP-Tuning: Structure-Aware Soft Prompt Tuning for Code Vulnerability Detection | arXiv | 2025.01.08 | Paper Link
Leveraging Large Language Models and Machine Learning for Smart Contract Vulnerability Detection | arXiv | 2025.01.04 | Paper Link
Investigating Large Language Models for Code Vulnerability Detection: An Experimental Study | arXiv | 2024.12.24 | Paper Link
Can LLM Prompting Serve as a Proxy for Static Analysis in Vulnerability Detection | arXiv | 2024.12.16 | Paper Link
ChatNVD: Advancing Cybersecurity Vulnerability Assessment with Large Language Models | arXiv | 2024.12.06 | Paper Link
CleanVul: Automatic Function-Level Vulnerability Detection in Code Commits Using LLM Heuristics | arXiv | 2024.11.26 | Paper Link
EnStack: An Ensemble Stacking Framework of Large Language Models for Enhanced Vulnerability Detection in Source Code | arXiv | 2024.11.25 | Paper Link
CryptoFormalEval: Integrating LLMs and Formal Verification for Automated Cryptographic Protocol Vulnerability Detection | arXiv | 2024.11.20 | Paper Link
Beyond Static Tools: Evaluating Large Language Models for Cryptographic Misuse Detection | arXiv | 2024.11.14 | Paper Link
LProtector: An LLM-driven Vulnerability Detection System | arXiv | 2024.11.04 | Paper Link
Enhancing Reverse Engineering: Investigating and Benchmarking Large Language Models for Vulnerability Analysis in Decompiled Binaries | arXiv | 2024.11.07 | Paper Link
ProveRAG: Provenance-Driven Vulnerability Analysis with Automated Retrieval-Augmented LLMs | arXiv | 2024.10.22 | Paper Link
RealVul: Can We Detect Vulnerabilities in Web Applications with LLM? | arXiv | 2024.10.10 | Paper Link
Code Vulnerability Repair with Large Language Model using Context-Aware Prompt Tuning | arXiv | 2024.09.27 | Paper Link
Boosting Cybersecurity Vulnerability Scanning based on LLM-supported Static Application Security Testing | arXiv | 2024.09.24 | Paper Link
VulnLLMEval: A Framework for Evaluating Large Language Models in Software Vulnerability Detection and Patching | arXiv | 2024.09.17 | Paper Link
Code Vulnerability Detection: A Comparative Analysis of Emerging Large Language Models | arXiv | 2024.09.16 | Paper Link
Exploring LLMs for Malware Detection: Review, Framework Design, and Countermeasure Approaches | arXiv | 2024.09.11 | Paper Link
SAFE: Advancing Large Language Models in Leveraging Semantic and Syntactic Relationships for Software Vulnerability Detection | arXiv | 2024.09.02 | Paper Link
Outside the Comfort Zone: Analysing LLM Capabilities in Software Vulnerability Detection | European symposium on research in computer security | 2024.08.29 | Paper Link
ANVIL: Anomaly-based Vulnerability Identification without Labelled Training Data | arXiv | 2024.08.28 | Paper Link
LLM-Enhanced Static Analysis for Precise Identification of Vulnerable OSS Versions | arXiv | 2024.08.14 | Paper Link
Exploring RAG-based Vulnerability Augmentation with LLMs | arXiv | 2024.08.08 | Paper Link
Harnessing the Power of LLMs in Source Code Vulnerability Detection | arXiv | 2024.08.07 | Paper Link
Towards Effectively Detecting and Explaining Vulnerabilities Using Large Language Models | arXiv | 2024.08.08 | Paper Link
Comparison of Static Application Security Testing Tools and Large Language Models for Repo-level Vulnerability Detection | arXiv | 2024.07.23 | Paper Link
SCoPE: Evaluating LLMs for Software Vulnerability Detection | arXiv | 2024.07.19 | Paper Link
Static Detection of Filesystem Vulnerabilities in Android Systems | arXiv | 2024.07.16 | Paper Link
Detect Llama -- Finding Vulnerabilities in Smart Contracts using Large Language Models | Information Security and Privacy | 2024.07.12 | Paper Link
Assessing the Effectiveness of LLMs in Android Application Vulnerability Analysis | arXiv | 2024.06.27 | Paper Link
MALSIGHT: Exploring Malicious Source Code and Benign Pseudocode for Iterative Binary Malware Summarization | arXiv | 2024.06.26 | Paper Link
Vul-RAG: Enhancing LLM-based Vulnerability Detection via Knowledge-level RAG | arXiv | 2024.06.19 | Paper Link
Generalization-Enhanced Code Vulnerability Detection via Multi-Task Instruction Fine-Tuning | ACL Findings | 2024.06.06 | Paper Link
LLM-Assisted Static Analysis for Detecting Security Vulnerabilities | arXiv | 2024.05.27 | Paper Link
Harnessing Large Language Models for Software Vulnerability Detection: A Comprehensive Benchmarking Study | arXiv | 2024.05.24 | Paper Link
DLAP: A Deep Learning Augmented Large Language Model Prompting Framework for Software Vulnerability Detection | Journal of Systems and Software | 2024.05.02 | Paper Link
Large Language Model for Vulnerability Detection and Repair: Literature Review and Roadmap | arXiv | 2024.04.04 | Paper Link
How Far Have We Gone in Vulnerability Detection Using Large Language Models | arXiv | 2023.12.22 | Paper Link
The FormAI Dataset: Generative AI in Software Security through the Lens of Formal Verification | International Conference on Predictive Models and Data Analytics in Software Engineering | 2023.09.02 | Paper Link
DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection | International Symposium on Research in Attacks, Intrusions and Defenses | 2023.08.09 | Paper Link
How ChatGPT is Solving Vulnerability Management Problem | arXiv | 2023.11.11 | Paper Link
Multi-role Consensus through LLMs Discussions for Vulnerability Detection | arXiv | 2024.03.21 | Paper Link
LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs' Vulnerability Reasoning | arXiv | 2024.01.29 | Paper Link
LLbezpeky: Leveraging Large Language Models for Vulnerability Detection | arXiv | 2024.01.13 | Paper Link
Software Vulnerability Detection with GPT and In-Context Learning | DSC | 2024.01.08 | Paper Link
GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis | ICSE | 2023.12.25 | Paper Link
Understanding the Effectiveness of Large Language Models in Detecting Security Vulnerabilities | arXiv | 2023.11.16 | Paper Link
The Hitchhiker's Guide to Program Analysis: A Journey with Large Language Models | arXiv | 2023.11.15 | Paper Link
Large Language Model-Powered Smart Contract Vulnerability Detection: New Perspectives | TPS-ISA | 2023.10.16 | Paper Link
Large Language Models for Test-Free Fault Localization | ICSE | 2023.10.03 | Paper Link
DefectHunter: A Novel LLM-Driven Boosted-Conformer-based Code Vulnerability Detection Mechanism | arXiv | 2023.09.27 | Paper Link
Software Vulnerability Detection using Large Language Models | ISSRE Workshop | 2023.09.02 | Paper Link
Using ChatGPT as a Static Application Security Testing Tool | arXiv | 2023.08.28 | Paper Link
Prompt-Enhanced Software Vulnerability Detection Using ChatGPT | ICSE | 2023.08.24 | Paper Link
VulLibGen: Identifying Vulnerable Third-Party Libraries via Generative Pre-Trained Model | arXiv | 2023.08.09 | Paper Link
Evaluation of ChatGPT Model for Vulnerability Detection | arXiv | 2023.04.12 | Paper Link
Software Vulnerability and Functionality Assessment using LLMs | arXiv | 2024.03.13 | Paper Link
Finetuning Large Language Models for Vulnerability Detection | arXiv | 2024.03.01 | Paper Link
Detecting software vulnerabilities using Language Models | CSR | 2023.02.23 | Paper Link

Insecure code Generation

Since this part has evolved to focus more on Code LLM research, it is no longer actively maintained.

Benchmarking Prompt Engineering Techniques for Secure Code Generation with GPT Models | arXiv | 2025.02.09 | Paper Link
ContractTinker: LLM-Empowered Vulnerability Repair for Real-World Smart Contracts | arXiv | 2024.09.15 | Paper Link
An Exploratory Study on Fine-Tuning Large Language Models for Secure Code Generation | arXiv | 2024.08.17 | Paper Link
Is Your AI-Generated Code Really Safe? Evaluating Large Language Models on Secure Code Generation with CodeSecEval | arXiv | 2024.07.04 | Paper Link
DistiLRR: Transferring Code Repair for Low-Resource Programming Languages | arXiv | 2024.06.20 | Paper Link
Code Repair with LLMs gives an Exploration-Exploitation Tradeoff | arXiv | 2024.05.30 | Paper Link
LLM Security Guard for Code | International Conference on Evaluation and Assessment in Software Engineering | 2024.05.03 | Paper Link
Do Neutral Prompts Produce Insecure Code? FormAI-v2 Dataset: Labelling Vulnerabilities in Code Generated by Large Language Models | arXiv | 2024.04.29 | Paper Link
Evolutionary Large Language Models for Hardware Security: A Comparative Survey | arXiv | 2024.04.25 | Paper Link
FLAG: Finding Line Anomalies (in code) with Generative AI | arXiv | 2023.07.22 | Paper Link
Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI Testing via Functionality-aware Decisions | ICSE | 2023.10.24 | Paper Link
DebugBench: Evaluating Debugging Capability of Large Language Models | ACL Findings | 2024.01.11 | Paper Link
Shifting the Lens: Detecting Malware in npm Ecosystem with Large Language Models | arXiv | 2024.03.18 | Paper Link
Using ChatGPT to Analyze Ransomware Messages and to Predict Ransomware Threats | Research Square | 2023.11.21 | Paper Link
Prompt Engineering-assisted Malware Dynamic Analysis Using GPT-4 | arXiv | 2023.12.13 | Paper Link
Evaluating and Explaining Large Language Models for Code Using Syntactic Structures | arXiv | 2023.08.07 | Paper Link
Understanding Programs by Exploiting (Fuzzing) Test Cases | ACL Findings | 2023.01.12 | Paper Link
Large Language Models for Code Analysis: Do LLMs Really Do Their Job? | USENIX | 2024.03.05 | Paper Link
LLM4Decompile: Decompiling Binary Code with Large Language Models | EMNLP | 2024.03.08 | Paper Link
Pop Quiz! Can a Large Language Model Help With Reverse Engineering? | arXiv | 2022.02.02 | Paper Link
Large Language Models for Code: Security Hardening and Adversarial Testing | ACM SIGSAC Conference on Computer and Communications Security | 2023.09.29 | Paper Link
How Secure is Code Generated by ChatGPT? | SMC | 2023.04.19 | Paper Link
A Comparative Study of Code Generation using ChatGPT 3.5 across 10 Programming Languages | arXiv | 2023.08.08 | Paper Link
Can Large Language Models Identify And Reason About Security Vulnerabilities? Not Yet | arXiv | 2023.12.19 | Paper Link
Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation | NeurIPS | 2023.10.30 | Paper Link
Generate and Pray: Using SALLMS to Evaluate the Security of LLM Generated Code | arXiv | 2023.11.01 | Paper Link
No Need to Lift a Finger Anymore? Assessing the Quality of Code Generation by ChatGPT | IEEE Trans. Software Eng. | 2023.08.09 | Paper Link
The Effectiveness of Large Language Models (ChatGPT and CodeBERT) for Security-Oriented Code Analysis | arXiv | 2023.08.29 | Paper Link
Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s Code Contributions | S&P | 2021.12.16 | Paper Link
Bugs in Large Language Models Generated Code | arXiv | 2024.03.18 | Paper Link
Lost at C: A User Study on the Security Implications of Large Language Model Code Assistants | USENIX | 2023.02.27 | Paper Link

Program Repair

VulnRepairEval: An Exploit-Based Evaluation Framework for Assessing Large Language Model Vulnerability Repair Capabilities | arXiv | 2025.09.03 | Paper Link
Automated Repair of C Programs Using Large Language Models | arXiv | 2025.09.02 | Paper Link
On the Evaluation of Large Language Models in Multilingual Vulnerability Repair | arXiv | 2025.08.05 | Paper Link
Repair-R1: Better Test Before Repair | arXiv | 2025.07.30 | Paper Link
Repairing vulnerabilities without invisible hands. A differentiated replication study on LLMs | arXiv | 2025.07.28 | Paper Link
The Impact of Fine-tuning Large Language Models on Automated Program Repair | arXiv | 2025.07.26 | Paper Link
Bug Fixing with Broader Context: Enhancing LLM-Based Program Repair via Layered Knowledge Injection | arXiv | 2025.06.30 | Paper Link
Repair Ingredients Are All You Need: Improving Large Language Model-Based Program Repair via Repair Ingredients Search | arXiv | 2025.06.30 | Paper Link
A Survey of LLM-based Automated Program Repair: Taxonomies, Design Paradigms, and Applications | arXiv | 2025.06.30 | Paper Link
Empirical Evaluation of Generalizable Automated Program Repair with Large Language Models | arXiv | 2025.06.03 | Paper Link
Boosting Open-Source LLMs for Program Repair via Reasoning Transfer and LLM-Guided Reinforcement Learning | arXiv | 2025.06.04 | Paper Link
Fixing 7,400 Bugs for 1$: Cheap Crash-Site Program Repair | arXiv | 2025.05.19 | Paper Link
Adversarial Reasoning for Repair Based on Inferred Program Intent | arXiv | 2025.05.19 | Paper Link
Synthetic Code Surgery: Repairing Bugs and Vulnerabilities with LLMs and Synthetic Data | arXiv | 2025.05.12 | Paper Link
Automated Repair of Ambiguous Natural Language Requirements | arXiv | 2025.05.12 | Paper Link
Towards Effectively Leveraging Execution Traces for Program Repair with Code LLMs | arXiv | 2025.05.07 | Paper Link
The Art of Repair: Optimizing Iterative Program Repair with Instruction-Tuned Models | arXiv | 2025.05.05 | Paper Link
Adapting Knowledge Prompt Tuning for Enhanced Automated Program Repair | arXiv | 2025.04.02 | Paper Link
LLM4CVE: Enabling Iterative Automated Vulnerability Repair with Large Language Models | arXiv | 2025.01.07 | Paper Link
From Defects to Demands: A Unified, Iterative, and Heuristically Guided LLM-Based Framework for Automated Software Repair and Requirement Realization | arXiv | 2024.12.06 | Paper Link
Integrating Various Software Artifacts for Better LLM-based Bug Localization and Program Repair | arXiv | 2024.12.05 | Paper Link
Fixing Security Vulnerabilities with AI in OSS-Fuzz | arXiv | 2024.11.21 | Paper Link
A Comprehensive Survey of AI-Driven Advancements and Techniques in Automated Program Repair and Code Generation | arXiv | 2024.11.12 | Paper Link
The Best Defense is a Good Offense: Countering LLM-Powered Cyberattacks | arXiv | 2024.10.20 | Paper Link
APOLLO: A GPT-based tool to detect phishing emails and generate explanations that warn users | arXiv | 2024.10.10 | Paper Link
Fixing Code Generation Errors for Large Language Models | arXiv | 2024.09.01 | Paper Link
MergeRepair: An Exploratory Study on Merging Task-Specific Adapters in Code LLMs for Automated Program Repair | arXiv | 2024.08.26 | Paper Link
Automated Software Vulnerability Patching using Large Language Models | arXiv | 2024.08.24 | Paper Link
Enhancing LLM-Based Automated Program Repair with Design Rationales | ASE | 2024.08.22 | Paper Link
RePair: Automated Program Repair with Process-based Feedback | ACL Findings | 2024.08.21 | Paper Link
Revisiting Evolutionary Program Repair via Code Language Model | arXiv | 2024.08.20 | Paper Link
ThinkRepair: Self-Directed Automated Program Repair | ACM SIGSOFT International Symposium on Software Testing and Analysis | 2024.07.30 | Paper Link
Automated C/C++ Program Repair for High-Level Synthesis via Large Language Models | ACM/IEEE International Symposium on Machine Learning for CAD | 2024.07.04 | Paper Link
Hybrid Automated Program Repair by Combining Large Language Models and Program Analysis | arXiv | 2024.06.04 | Paper Link
A Case Study of LLM for Automated Vulnerability Repair: Assessing Impact of Reasoning and Patch Validation Feedback | Proceedings of the 1st ACM International Conference on AI-Powered Software | 2024.05.24 | Paper Link
Automated Repair of AI Code with Large Language Models and Formal Verification | arXiv | 2024.05.14 | Paper Link
A Systematic Literature Review on Large Language Models for Automated Program Repair | arXiv | 2024.05.12 | Paper Link
Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language Models | arXiv | 2024.03.23 | Paper Link
How Far Can We Go with Practical Function-Level Program Repair? | arXiv | 2024.04.19 | Paper Link
Multi-Objective Fine-Tuning for Enhanced Program Repair with LLMs | arXiv | 2024.04.22 | Paper Link
Aligning LLMs for FL-free Program Repair | arXiv | 2024.04.13 | Paper Link
When Large Language Models Confront Repository-Level Automatic Program Repair: How Well They Done? | ICSE | 2023.03.01 | Paper Link
ContrastRepair: Enhancing Conversation-Based Automated Program Repair via Contrastive Test Case Pairs | arXiv | 2024.03.07 | Paper Link
LLM-Powered Code Vulnerability Repair with Reinforcement Learning and Semantic Reward | arXiv | 2024.02.22 | Paper Link
Copiloting the Copilots: Fusing Large Language Models with Completion Engines for Automated Program Repair | ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering | 2023.11.08 | Paper Link
Better Patching Using LLM Prompting, via Self-Consistency | ASE | 2023.08.16 | Paper Link
Teaching Large Language Models to Self-Debug | ICLR | 2023.10.05 | Paper Link
Enhanced Automated Code Vulnerability Repair using Large Language Models | Eng. Appl. Artif. Intell. | 2024.01.08 | Paper Link
A Study of Vulnerability Repair in JavaScript Programs with Large Language Models | WWW | 2023.03.19 | Paper Link
Fixing Hardware Security Bugs with Large Language Models | arXiv | 2023.02.02 | Paper Link
DIVAS: An LLM-based End-to-End Framework for SoC Security Analysis and Policy-based Protection | arXiv | 2023.08.14 | Paper Link
ZeroLeak: Using LLMs for Scalable and Cost Effective Side-Channel Patching | arXiv | 2023.08.24 | Paper Link
InferFix: End-to-End Program Repair with LLMs | ESEC/FSE | 2023.03.13 | Paper Link
Can LLMs Patch Security Issues? | arXiv | 2024.02.19 | Paper Link
How Effective Are Neural Networks for Fixing Security Vulnerabilities | ISSTA | 2023.05.29 | Paper Link
Examining Zero-Shot Vulnerability Repair with Large Language Models | SP | 2022.08.15 | Paper Link
Security Code Review by LLMs: A Deep Dive into Responses | arXiv | 2024.01.29 | Paper Link
Practical Program Repair in the Era of Large Pre-trained Language Models | arXiv | 2022.10.25 | Paper Link
AI-powered patching: the future of automated vulnerability fixes | google | 2024.01.31 | Paper Link
An Analysis of the Automatic Bug Fixing Performance of ChatGPT | APR@ICSE | 2023.01.20 | Paper Link
Automatic Program Repair with OpenAI's Codex: Evaluating QuixBugs | arXiv | 2023.11.06 | Paper Link

Anomaly Detection / Defense

LLM-driven Provenance Forensics for Threat Investigation and Detection | arxiv | 2025.08.29 | Paper Link
FALCON: Autonomous Cyber Threat Intelligence Mining with LLMs for IDS Rule Generation | arxiv | 2025.08.26 | Paper Link
Chimera: Harnessing Multi-Agent LLMs for Automatic Insider Threat Simulation | arxiv | 2025.08.11 | Paper Link
Think Broad, Act Narrow: CWE Identification with Multi-Agent Large Language Models | arxiv | 2025.08.02 | Paper Link
OFCnetLLM: Large Language Model for Network Monitoring and Alertness | arxiv | 2025.07.30 | Paper Link
Large Language Model-Based Framework for Explainable Cyberattack Detection in Automatic Generation Control Systems | arxiv | 2025.07.29 | Paper Link
From Alerts to Intelligence: A Novel LLM-Aided Framework for Host-based Intrusion Detection | arxiv | 2025.07.15 | Paper Link
Can Large Language Models Improve Phishing Defense? A Large-Scale Controlled Experiment on Warning Dialogue Explanations | arxiv | 2025.07.10 | Paper Link
Large Language Models for Network Intrusion Detection Systems: Foundations, Implementations, and Future Directions | arxiv | 2025.07.07 | Paper Link
Adaptive Linguistic Prompting (ALP) Enhances Phishing Webpage Detection in Multimodal Large Language Models | arxiv | 2025.06.29 | Paper Link
Leveraging Large Language Model for Intelligent Log Processing and Autonomous Debugging in Cloud AI Platforms | arxiv | 2025.06.22 | Paper Link
SmartGuard: Leveraging Large Language Models for Network Attack Detection through Audit Log Analysis and Summarization | arxiv | 2025.06.20 | Paper Link
PhishDebate: An LLM-Based Multi-Agent Framework for Phishing Website Detection | arxiv | 2025.06.18 | Paper Link
LLM-Powered Intent-Based Categorization of Phishing Emails | arxiv | 2025.06.17 | Paper Link
Evaluating Large Language Models for Phishing Detection, Self-Consistency, Faithfulness, and Explainability | arxiv | 2025.06.16 | Paper Link
Training RL Agents for Multi-Objective Network Defense Tasks | arxiv | 2025.06.13 | Paper Link
A Unified Framework for Human AI Collaboration in Security Operations Centers with Trusted Autonomy | arxiv | 2025.06.01 | Paper Link
MultiPhishGuard: An LLM-based Multi-Agent System for Phishing Email Detection | arxiv | 2025.05.27 | Paper Link
IRCopilot: Automated Incident Response with Large Language Models | arxiv | 2025.05.27 | Paper Link
LLM-Driven APT Detection for 6G Wireless Networks: A Systematic Review and Taxonomy | arxiv | 2025.05.24 | Paper Link
Benchmarking LLMs in an Embodied Environment for Blue Team Threat Hlunting | arxiv | 2025.05.17 | Paper Link
Automating Security Audit Using Large Language Model based Agent: An Exploration Experiment | arxiv | 2025.05.16 | Paper Link
On Technique Identification and Threat-Actor Attribution using LLMs and Embedding Models | arxiv | 2025.05.15 | Paper Link
Towards AI-Driven Human-Machine Co-Teaming for Adaptive and Agile Cyber Security Operation Centers | arxiv | 2025.05.09 | Paper Link
Large Language Models are Autonomous Cyber Defenders | arxiv | 2025.05.08 | Paper Link
Bridging Expertise Gaps: The Role of LLMs in Human-AI Collaboration for Cybersecurity | arxiv | 2025.05.06 | Paper Link
LLM-Based Threat Detection and Prevention Framework for IoT Ecosystems | arxiv | 2025.05.01 | Paper Link
Improving Phishing Email Detection Performance of Small Large Language Models | arxiv | 2025.04.29 | Paper Link
AnomalyGen: An Automated Semantic Log Sequence Generation Framework with LLM for Anomaly Detection | arxiv | 2025.04.16 | Paper Link
Investigating cybersecurity incidents using large language models in latest-generation wireless networks | arxiv | 2025.04.14 | Paper Link
SoK: LLM-based Log Parsing | arxiv | 2025.04.07 | Paper Link
Knowledge Transfer from LLMs to Provenance Analysis: A Semantic-Augmented Method for APT Detection | arxiv | 2025.03.24 | Paper Link
Large Language Models powered Network Attack Detection: Architecture, Opportunities and Case Study | arxiv | 2025.03.24 | Paper Link
Payload-Aware Intrusion Detection with CMAE and Large Language Models | arxiv | 2025.03.23 | Paper Link
RedChronos: A Large Language Model-Based Log Analysis System for Insider Threat Detection in Enterprises | arxiv | 2025.03.05 | Paper Link
Enhancing Cybersecurity in Critical Infrastructure with LLM-Assisted Explainable IoT Systems | arxiv | 2025.03.05 | Paper Link
Transforming Cyber Defense: Harnessing Agentic and Frontier AI for Proactive, Ethical Threat Intelligence | arxiv | 2025.02.28 | Paper Link
Cyber Defense Reinvented: Large Language Models as Threat Intelligence Copilots | arXiv | 2025.02.28 | Paper Link
Design and implementation of a distributed security threat detection system integrating federated learning and multimodal LLM | arXiv | 2025.02.28 | Paper Link
LAMD: Context-driven Android Malware Detection and Classification with LLMs | arXiv | 2025.02.18 | Paper Link
APT-LLM: Embedding-Based Anomaly Detection of Cyber Advanced Persistent Threats Using Large Language Models | arXiv | 2025.02.13 | Paper Link
AdaPhish: AI-Powered Adaptive Defense and Education Resource Against Deceptive Emails | arXiv | 2025.02.05 | Paper Link
SHIELD: APT Detection and Intelligent Explanation Using LLM | arXiv | 2025.02.04 | Paper Link
LLM-based event log analysis techniques: A survey | arXiv | 2025.02.02 | Paper Link
TORCHLIGHT: Shedding LIGHT on Real-World Attacks on Cloudless IoT Devices Concealed within the Tor Network | arXiv | 2025.01.28 | Paper Link
Confront Insider Threat: Precise Anomaly Detection in Behavior Logs Based on LLM Fine-Tuning | COLING | 2024 | Paper Link
Exploring Large Language Models for Semantic Analysis and Categorization of Android Malware | arXiv | 2025.01.08 | Paper Link
Large Multimodal Agents for Accurate Phishing Detection with Enhanced Token Optimization and Cost Reduction | arXiv | 2024.12.03 | Paper Link
LogLM: From Task-based to Instruction-based Automated Log Analysis | arXiv | 2024.10.12 | Paper Link
LogLLM: Log-based Anomaly Detection Using Large Language Models | arXiv | 2024.11.13 | Paper Link
Using Large Language Models for Template Detection from Security Event Logs | arXiv | 2024.09.08 | Paper Link
A Comparative Study on Large Language Models for Log Parsing | arXiv | 2024.09.04 | Paper Link
LUK: Empowering Log Understanding with Expert Knowledge from Large Language Models | arXiv | 2024.09.03 | Paper Link
XG-NID: Dual-Modality Network Intrusion Detection using a Heterogeneous Graph Neural Network and Large Language Model | arXiv | 2024.08.27 | Paper Link
LogParser-LLM: Advancing Efficient Log Parsing with Large Language Models | arXiv | 2024.08.25 | Paper Link
Automated Phishing Detection Using URLs and Webpages | arXiv | 2024.08.16 | Paper Link
Transformers and Large Language Models for Efficient Intrusion Detection Systems: A Comprehensive Survey | arXiv | 2024.08.14 | Paper Link
Multimodal Large Language Models for Phishing Webpage Detection and Identification | arXiv | 2024.08.12 | Paper Link
Utilizing Large Language Models to Optimize the Detection and Explainability of Phishing Websites | arXiv | 2024.08.11 | Paper Link
Towards Explainable Network Intrusion Detection using Large Language Models | arXiv | 2024.08.08 | Paper Link
Audit-LLM: Multi-Agent Collaboration for Log-based Insider Threat Detection | arXiv | 2024.07.12 | Paper Link
LogEval: A Comprehensive Benchmark Suite for Large Language Models In Log Analysis | arXiv | 2024.07.02 | Paper Link
Defending Against Social Engineering Attacks in the Age of LLMs | EMNLP | 2024.06.18 | Paper Link
Anomaly Detection on Unstable Logs with GPT Models | arXiv | 2024.06.11 | Paper Link
ULog: Unsupervised Log Parsing with Large Language Models through Log Contrastive Units | arXiv | 2024.06.11 | Paper Link
Generative AI-in-the-loop: Integrating LLMs and GPTs into the Next Generation Networks | arXiv | 2024.06.06 | Paper Link
Log Parsing with Self-Generated In-Context Learning and Self-Correction | arXiv | 2024.06.05 | Paper Link
Large Language Models in Wireless Application Design: In-Context Learning-enhanced Automatic Network Intrusion Detection | arXiv | 2024.05.17 | Paper Link
DoLLM: How Large Language Models Understanding Network Flow Data to Detect Carpet Bombing DDoS | arXiv | 2024.05.12 | Paper Link
LLMParser: An Exploratory Study on Using Large Language Models for Log Parsing | ICSE | 2024.04.27 | Paper Link
Large Language Models Spot Phishing Emails with Surprising Accuracy: A Comparative Analysis of Performance | arXiv | 2024.04.23 | Paper Link
ChatGPT for digital forensic investigation: The good, the bad, and the unknown | Forensic Science International: Digital Investigation | 2023.07.10 | Paper Link
HuntGPT: Integrating Machine Learning-Based Anomaly Detection and Explainable AI with Large Language Models (LLMs) | arXiv | 2023.09.27 | Paper Link
Revolutionizing Cyber Threat Detection with Large Language Models: A privacy-preserving BERT-based Lightweight Model for IoT/IIoT Devices | IEEE Access | 2024.02.08 | Paper Link
Explaining Tree Model Decisions in Natural Language for Network Intrusion Detection | arXiv | 2023.10.30 | Paper Link
Devising and Detecting Phishing: Large Language Models vs. Smaller Human Models | arXiv | 2023.11.30 | Paper Link
Prompted Contextual Vectors for Spear-Phishing Detection | arXiv | 2024.02.14 | Paper Link
Evaluating the Performance of ChatGPT for Spam Email Detection | arXiv | 2024.02.23 | Paper Link
An Improved Transformer-based Model for Detecting Phishing, Spam, and Ham: A Large Language Model Approach | arXiv | 2023.11.12 | Paper Link
Application of Large Language Models to DDoS Attack Detection | International Conference on Security and Privacy in Cyber-Physical Systems and Smart Vehicles | 2024.02.05 | Paper Link
Web Content Filtering through knowledge distillation of Large Language Models | WI-IAT | 2023.05.10 | Paper Link
Lemur: Log Parsing with Entropy Sampling and Chain-of-Thought Merging | arXiv | 2024.03.02 | Paper Link
Interpretable Online Log Analysis Using Large Language Models with Prompt Strategies | ICPC | 2024.01.26 | Paper Link
LogGPT: Log Anomaly Detection via GPT | BigData | 2023.12.11 | Paper Link
LogGPT: Exploring ChatGPT for Log-Based Anomaly Detection | HPCC/DSS/SmartCity/DependSys | 2023.09.14 | Paper Link
Log-based Anomaly Detection based on EVT Theory with feedback | arXiv | 2023.09.30 | Paper Link
Benchmarking Large Language Models for Log Analysis, Security, and Interpretation | J. Netw. Syst. Manag. | 2023.11.24 | Paper Link

LLM Assisted Attack

An Automated Attack Investigation Approach Leveraging Threat-Knowledge-Augmented Large Language Models | arxiv | 2025.09.01 | Paper Link
Cybersecurity AI: Hacking the AI Hackers via Prompt Injection | arxiv | 2025.09.01 | Paper Link
SoK: Large Language Model-Generated Textual Phishing Campaigns End-to-End Analysis of Generation, Characteristics, and Detection | arxiv | 2025.08.29 | Paper Link
Pentest-R1: Towards Autonomous Penetration Testing Reasoning Optimized via Two-Stage Reinforcement Learning | arxiv | 2025.08.10 | Paper Link
PenTest2.0: Towards Autonomous Privilege Escalation Using GenAI | arxiv | 2025.08.09 | Paper Link
Prompt to Pwn: Automated Exploit Generation for Smart Contracts | arxiv | 2025.08.02 | Paper Link
Can We End the Cat-and-Mouse Game? Simulating Self-Evolving Phishing Attacks with LLMs and Genetic Algorithms | arxiv | 2025.07.29 | Paper Link
Exploiting Jailbreaking Vulnerabilities in Generative AI to Bypass Ethical Safeguards for Facilitating Phishing Attacks | arxiv | 2025.07.16 | Paper Link
LLMalMorph: On The Feasibility of Generating Variant Malware using Large-Language-Models | arxiv | 2025.07.13 | Paper Link
On the Surprising Efficacy of LLMs for Penetration-Testing | arxiv | 2025.07.01 | Paper Link
From Promise to Peril: Rethinking Cybersecurity Red and Blue Teaming in the Age of LLMs | arxiv | 2025.06.16 | Paper Link
On the Ethics of Using LLMs for Offensive Security | arxiv | 2025.06.10 | Paper Link
ReCopilot: Reverse Engineering Copilot in Binary Analysis | arxiv | 2025.05.22 | Paper Link
LLMs unlock new paths to monetizing exploits | arxiv | 2025.05.16 | Paper Link
AutoPentest: Enhancing Vulnerability Management With Autonomous LLM Agents | arxiv | 2025.05.15 | Paper Link
Offensive Security for AI Systems: Concepts, Practices, and Applications | arxiv | 2025.05.09 | Paper Link
Weaponizing Language Models for Cybersecurity Offensive Operations: Automating Vulnerability Assessment Report Validation; A Review Paper | arxiv | 2025.05.07 | Paper Link
PwnGPT: Automatic Exploit Generation Based on Large Language Models | ACL | 2025.04 | Paper Link
On the Feasibility of Using MultiModal LLMs to Execute AR Social Engineering Attacks | arxiv | 2025.04.16 | Paper Link
Benchmarking Practices in LLM-driven Offensive Security: Testbeds, Metrics, and Experiment Design | arxiv | 2025.04.14 | Paper Link
Red Teaming with Artificial Intelligence-Driven Cyberattacks: A Scoping Review | arxiv | 2025.03.25 | Paper Link
A Framework for Evaluating Emerging Cyberattack Capabilities of AI | arxiv | 2025.03.15 | Paper Link
Jailbreaking Generative AI: Empowering Novices to Conduct Phishing Attacks | arxiv | 2025.03.03 | Paper Link
CAI: An Open, Bug Bounty-Ready Cybersecurity AI | arXiv | 2025.04.15 | Paper Link
RapidPen: Fully Automated IP-to-Shell Penetration Testing with LLM-based Agents | arXiv | 2025.02.23 | Paper Link
Construction and Evaluation of LLM-based agents for Semi-Autonomous penetration testing | arXiv | 2025.02.21 | Paper Link
OCCULT: Evaluating Large Language Models for Offensive Cyber Operation Capabilities | arXiv | 2025.02.18 | Paper Link
PenTest++: Elevating Ethical Hacking with AI and Automation | arXiv | 2025.02.13 | Paper Link
Can LLMs Hack Enterprise Networks? Autonomous Assumed Breach Penetration-Testing Active Directory Networks | arXiv | 2025.02.06 | Paper Link
On the Feasibility of Using LLMs to Execute Multistage Network Attacks | arXiv | 2025.01.27 | Paper Link
HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing | arXiv | 2024.12.02 | Paper Link
Hacking CTFs with Plain Agents | arXiv | 2024.12.03 | Paper Link
Evaluating and Improving the Robustness of Security Attack Detectors Generated by LLMs | arXiv | 2024.11.27 | Paper Link
AI-Augmented Ethical Hacking: A Practical Examination of Manual Exploitation and Privilege Escalation in Linux Environments | arXiv | 2024.11.26 | Paper Link
Next-Generation Phishing: How LLM Agents Empower Cyber Attackers | arXiv | 2024.11.22 | Paper Link
Adapting to Cyber Threats: A Phishing Evolution Network (PEN) Framework for Phishing Generation and Analyzing Evolution Patterns using Large Language Models | arXiv | 2024.11.18 | Paper Link
Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks | arXiv | 2024.11.18 | Paper Link
PentestAgent: Incorporating LLM Agents to Automated Penetration Testing | arXiv | 2024.11.07 | Paper Link
AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? | arXiv | 2024.11.02 | Paper Link
AutoPenBench: Benchmarking Generative Agents for Penetration Testing | arXiv | 2024.10.28 | Paper Link
Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements | arXiv | 2024.10.25 | Paper Link
On the Feasibility of Fully AI-automated Vishing Attacks | arXiv | 2024.09.20 | Paper Link
Hacking, The Lazy Way: LLM Augmented Pentesting | arXiv | 2024.09.14 | Paper Link
Is Generative AI the Next Tactical Cyber Weapon For Threat Actors? Unforeseen Implications of AI Generated Cyber Attacks | arXiv | 2024.08.23 | Paper Link
CIPHER: Cybersecurity Intelligent Penetration-testing Helper for Ethical Researcher | Sensors | 2024.08.21 | Paper Link
Using Retriever Augmented Large Language Models for Attack Graph Generation | arXiv | 2024.08.11 | Paper Link
Practical Attacks against Black-box Code Completion Engines | arXiv | 2024.08.05 | Paper Link
PenHeal: A Two-Stage LLM Framework for Automated Pentesting and Optimal Remediation | Proceedings of the Workshop on Autonomous Cybersecurity | 2024.07.25 | Paper Link
From Sands to Mansions: Enabling Automatic Full-Life-Cycle Cyberattack Construction with LLM | arXiv | 2024.07.24 | Paper Link
The Shadow of Fraud: The Emerging Danger of AI-powered Social Engineering and its Possible Cure | arXiv | 2024.07.22 | Paper Link
Tactics, Techniques, and Procedures (TTPs) in Interpreted Malware: A Zero-Shot Generation with Large Language Models | arXiv | 2024.07.11 | Paper Link
Assessing AI vs Human-Authored Spear Phishing SMS Attacks: An Empirical Study Using the TRAPD Method | arXiv | 2024.06.18 | Paper Link
Getting pwn’d by AI: Penetration Testing with Large Language Models | ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering | 2023.08.17 | Paper Link
RatGPT: Turning online LLMs into Proxies for Malware Attacks | arXiv | 2023.09.07 | Paper Link
AutoAttacker: A Large Language Model Guided System to Implement Automatic Cyber-attacks | arXiv | 2024.03.02 | Paper Link
PentestGPT: An LLM-empowered Automatic Penetration Testing Tool | USENIX | 2023.08.13 | Paper Link
From Text to MITRE Techniques: Exploring the Malicious Use of Large Language Models for Generating Cyber Attack Payloads | arXiv | 2023.05.24 | Paper Link
From Chatbots to PhishBots? - Preventing Phishing scams created using ChatGPT, Google Bard and Claude | arXiv | 2024.03.10 | Paper Link
Exploring the Dark Side of AI: Advanced Phishing Attack Design and Deployment Using ChatGPT | CNS | 2023.09.19 | Paper Link
Using Large Language Models for Cybersecurity Capture-The-Flag Challenges and Certification Questions | arXiv | 2023.08.21 | Paper Link
Evaluating LLMs for Privilege-Escalation Scenarios | arXiv | 2023.10.23 | Paper Link
Malla: Demystifying Real-world Large Language Model Integrated Malicious Services | USENIX | 2024.01.06 | Paper Link
LLMs Killed the Script Kiddie: How Agents Supported by Large Language Models Change the Landscape of Network Threat Testing | arXiv | 2023.10.10 | Paper Link
From ChatGPT to ThreatGPT: Impact of Generative AI in Cybersecurity and Privacy | IEEE Access | 2023.07.03 | Paper Link
Impact of Big Data Analytics and ChatGPT on Cybersecurity | I3CS | 2023.05.22 | Paper Link
Identifying and mitigating the security risks of generative ai | Foundations and Trends in Privacy and Security | 2023.12.29 | Paper Link

Others

From Legacy to Standard: LLM-Assisted Transformation of Cybersecurity Playbooks into CACAO Format | arxiv | 2025.08.05 | Paper Link
Information Security Based on LLM Approaches: A Review | arxiv | 2025.07.24 | Paper Link
Large Language Models in Cybersecurity: Applications, Vulnerabilities, and Defense Techniques | arxiv | 2025.07.18 | Paper Link
Cybersecurity AI: The Dangerous Gap Between Automation and Autonomy | arxiv | 2025.06.30 | Paper Link
Using LLMs for Security Advisory Investigations: How Far Are We? | arxiv | 2025.06.16 | Paper Link
Exposing the Impact of GenAI for Cybercrime: An Investigation into the Dark Side | arxiv | 2025.05.29 | Paper Link
Large Language Models for IT Automation Tasks: Are We There Yet? | arxiv | 2025.05.26 | Paper Link
Mitigating Cyber Risk in the Age of Open-Weight LLMs: Policy Gaps and Technical Realities | arxiv | 2025.05.21 | Paper Link
ACSE-Eval: Can LLMs threat model real-world cloud infrastructure? | arxiv | 2025.05.16 | Paper Link
LLMs Suitability for Network Security: A Case Study of STRIDE Threat Modeling | arxiv | 2025.05.06 | Paper Link
From Texts to Shields: Convergence of Large Language Models and Cybersecurity | arxiv | 2025.05.01 | Paper Link
Automatically Generating Rules of Malicious Software Packages via Large Language Model | arxiv | 2025.04.24 | Paper Link
Exploring the Role of Large Language Models in Cybersecurity: A Systematic Survey | arxiv | 2025.04.22 | Paper Link
SoK: Frontier AIs Impact on the Cybersecurity Landscape | arxiv | 2025.04.07 | Paper Link
Emerging Cyber Attack Risks of Medical AI Agents | arxiv | 2025.04.02 | Paper Link
Inducing Personality in LLM-Based Honeypot Agents: Measuring the Effect on Human-Like Agenda Generation | arxiv | 2025.03.25 | Paper Link
ChatIoT: Large Language Model-based Security Assistant for Internet of Things with Retrieval-Augmented Generation | arXiv | 2025.02.14 | Paper Link
Empowering AIOps: Leveraging Large Language Models for IT Operations Management | arXiv | 2025.01.21 | Paper Link
BARTPredict: Empowering IoT Security with LLM-Driven Cyber Threat Prediction | arXiv | 2025.01.03 | Paper Link
Toward Intelligent and Secure Cloud: Large Language Model Empowered Proactive Defense | arXiv | 2024.12.30 | Paper Link
Emerging Security Challenges of Large Language Models | arXiv | 2024.12.23 | Paper Link
Ontology-Aware RAG for Improved Question-Answering in Cybersecurity Education | arXiv | 2024.12.10 | Paper Link
Integrating Large Language Models with Internet of Things Applications | arXiv | 2024.10.25 | Paper Link
CmdCaliper: A Semantic-Aware Command-Line Embedding Model and Dataset for Security Research | EMNLP | 2024.10.02 | Paper Link
Advancing Cyber Incident Timeline Analysis Through Rule Based AI and Large Language Models | arXiv | 2024.09.25 | Paper Link
Contextualized AI for Cyber Defense: An Automated Survey using LLMs | arXiv | 2024.09.20 | Paper Link
LLM Honeypot: Leveraging Large Language Models as Advanced Interactive Honeypot Systems | arXiv | 2024.09.15 | Paper Link
ScriptSmith: A Unified LLM Framework for Enhancing IT Operations via Automated Bash Script Generation, Assessment, and Refinement | arXiv | 2024.09.12 | Paper Link
Beyond Detection: Leveraging Large Language Models for Cyber Attack Prediction in IoT Networks | arXiv | 2024.08.26 | Paper Link
MistralBSM: Leveraging Mistral-7B for Vehicular Networks Misbehavior Detection | arXiv | 2024.07.26 | Paper Link
MoRSE: Bridging the Gap in Cybersecurity Expertise with Retrieval Augmented Generation | arXiv | 2024.07.22 | Paper Link
Disassembling Obfuscated Executables with LLM | arXiv | 2024.07.12 | Paper Link
On Large Language Models in National Security Applications | arXiv | 2024.07.03 | Paper Link
Threat Modelling and Risk Analysis for Large Language Model (LLM)-Powered Applications | arXiv | 2024.06.16 | Paper Link
Exploring the Efficacy of Large Language Models (GPT-4) in Binary Reverse Engineering | arXiv | 2024.06.09 | Paper Link
A Comprehensive Overview of Large Language Models (LLMs) for Cyber Defences: Opportunities and Directions | arXiv | 2024.05.23 | Paper Link
LLMPot: Automated LLM-based Industrial Protocol and Physical Process Emulation for ICS Honeypots | arXiv | 2024.05.10 | Paper Link
Critical Infrastructure Protection: Generative AI, Challenges, and Opportunities | arXiv | 2024.05.08 | Paper Link
Large Language Models for Cyber Security: A Systematic Literature Review | arXiv | 2024.05.08 | Paper Link
AppPoet: Large Language Model based Android malware detection via multi-view prompt engineering | arXiv | 2024.04.29 | Paper Link
Act as a Honeytoken Generator! An Investigation into Honeytoken Generation with Large Language Models | arXiv | 2024.04.24 | Paper Link
How Far Have We Gone in Stripped Binary Code Understanding Using Large Language Models | arXiv | 2024.04.16 | Paper Link
Is Stack Overflow Obsolete? An Empirical Study of the Characteristics of ChatGPT Answers to Stack Overflow Questions | CHI | 2024.02.07 | Paper Link
Prompting Is All You Need: Automated Android Bug Replay with Large Language Models | ICSE | 2023.07.18 | Paper Link
Enhancing Network Management Using Code Generated by Large Language Models | Proceedings of the 22nd ACM Workshop on Hot Topics in Networks | 2023.08.11 | [Paper Link] (https://arxiv.org/abs/2308.06261)
Employing LLMs for Incident Response Planning and Review | arXiv | 2024.03.02 | Paper Link
LLM in the Shell: Generative Honeypots | EuroS&P Workshop | 2024.02.09 | Paper Link
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations | arXiv | 2023.12.07 | Paper Link
Harnessing the Power of LLM to Support Binary Taint Analysis | arXiv | 2023.10.12 | Paper Link
LLM for SoC Security: A Paradigm Shift | IEEE Access | 2023.10.09 | Paper Link
Just-in-Time Security Patch Detection -- LLM At the Rescue for Data Augmentation | arXiv | 2023.12.12 | Paper Link
Anatomy of an AI-powered malicious social botnet | arXiv | 2023.07.30 | Paper Link
An LLM-based Framework for Fingerprinting Internet-connected Devices | ACM on Internet Measurement Conference | 2023.10.24 | Paper Link

RQ3: What are further research directions about the application of LLMs in cybersecurity?

Further Research: Agent4Cybersecurity

Training Language Model Agents to Find Vulnerabilities with CTF-Dojo | arxiv | 2025.08.25 | Paper Link
FaultLine: Automated Proof-of-Vulnerability Generation Using LLM Agents | arxiv | 2025.07.21 | Paper Link
From CVE Entries to Verifiable Exploits: An Automated Multi-Agent Framework for Reproducing CVEs | arxiv | 2025.09.02 | Paper Link
Multi-Agent Penetration Testing AI for the Web | arxiv | 2025.08.28 | Paper Link
CyberSleuth: Autonomous Blue-Team LLM Agent for Web Attack Forensics | arxiv | 2025.08.28 | Paper Link
BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems | arxiv | 2025.07.10 | Paper Link
AIRTBench: Measuring Autonomous AI Red Teaming Capabilities in Language Models | arxiv | 2025.06.17 | Paper Link
Measuring and Augmenting Large Language Models for Solving Capture-the-Flag Challenges | arxiv | 2025.06.21 | Paper Link
Towards Effective Offensive Security LLM Agents: Hyperparameter Tuning, LLM as a Judge, and a Lightweight CTF Benchmark | arxiv | 2025.08.05 | Paper Link
Autonomous Penetration Testing: Solving Capture-the-Flag Challenges with LLMs | arxiv | 2025.08.01 | Paper Link
AURA: A Multi-Agent Intelligence Framework for Knowledge-Enhanced Cyber Threat Attribution | arxiv | 2025.06.11 | Paper Link
Improving LLM Agents with Reinforcement Learning on Cryptographic CTF Challenges | arxiv | 2025.06.01 | Paper Link
RefPentester: A Knowledge-Informed Self-Reflective Penetration Testing Framework Based on Large Language Models | arxiv | 2025.05.1agent\t1 | Paper Link
RedTeamLLM: an Agentic AI framework for offensive security | arxiv | 2025.05.11 | Paper Link
AutoPatch: Multi-Agent Framework for Patching Real-World CVE Vulnerabilities | arxiv | 2025.05.07 | Paper Link
Agent That Debugs: Dynamic State-Guided Vulnerability Repair | arxiv | 2025.04.10 | Paper Link
CAI: An Open, Bug Bounty-Ready Cybersecurity AI | arxiv | 2025.04.08 | Paper Link
Agentic AI and the Cyber Arms Race | arxiv | 2025.03.10 | Paper Link
VulnBot: Autonomous Penetration Testing for A Multi-Agent Collaborative Framework | arXiv | 2025.01.23 | Paper Link
Multi-Agent Collaboration in Incident Response with Large Language Models | arXiv | 2024.12.03 | Paper Link
LLM Agent Honeypot: Monitoring AI Hacking Agents in the Wild | arXiv | 2024.10.17 | Paper Link
MarsCode Agent: AI-native Automated Bug Fixing | arXiv | 2024.09.04 | Paper Link
BreachSeek: A Multi-Agent Automated Penetration Tester | arXiv | 2024.08.31 | Paper Link
PhishAgent: A Robust Multimodal Agent for Phishing Webpage Detection | arXiv | 2024.08.20 | Paper Link
Using LLMs to Automate Threat Intelligence Analysis Workflows in Security Operation Centers | arXiv | 2024.07.18 | Paper Link
Teams of LLM Agents can Exploit Zero-Day Vulnerabilities | arXiv | 2024.06.02 | Paper Link
Generative AI and Large Language Models for Cyber Security: All Insights You Need | arXiv | 2024.05.21 | Paper Link
Generative AI in Cybersecurity | arXiv | 2024.05.02 | Paper Link
Large Language Models for Networking: Workflow, Advances and Challenges | arXiv | 2024.04.29 | Paper Link
LLM Agents can Autonomously Exploit One-day Vulnerabilities | arXiv | 2024.04.17 | Paper Link
InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents | ACL Findings | 2024.03.25 | Paper Link
WIPI: A New Web Threat for LLM-Driven Web Agents | arXiv | 2024.02.26 | Paper Link
R-Judge: Benchmarking Safety Risk Awareness for LLM Agents | EMNLP Findings | 2024.02.18 | Paper Link
Large Language Models for Networking: Applications, Enabling Techniques, and Challenges | arXiv | 2023.11.29 | Paper Link
TaskWeaver: A Code-First Agent Framework | arXiv | 2023.12.01 | Paper Link
If llm is the wizard, then code is the wand: A survey on how code empowers large language models to serve as intelligent agents. | arXiv | 2024.01.08 | Paper Link
From Summary to Action: Enhancing Large Language Models for Complex Tasks with Open World APIs | arXiv | 2024.02.28 | Paper Link
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs | ICLR | 2023.10.03 | Paper Link
The Rise and Potential of Large Language Model Based Agents: A Survey | arXiv | 2023.09.19 | Paper Link
TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage | arXiv | 2023.11.07 | Paper Link
Nissist: An Incident Mitigation Copilot based on Troubleshooting Guides | ECAI | 2024.02.27 | Paper Link
Llm agents can autonomously hack websites. | arXiv | 2024.02.16 | Paper Link
Out of the Cage: How Stochastic Parrots Win in Cyber Security Environments | ICAART | 2023.08.28 | Paper Link
LLMind: Orchestrating AI and IoT with LLM for Complex Task Execution | arXiv | 2024.02.20 | Paper Link
A unified cybersecurity framework for complex environments | Proceedings of the Annual Conference of the South African Institute of Computer Scientists and Information Technologists | 2018.09.26 | Paper Link
Cybersecurity Issues and Challenges | Handbook of research on cybersecurity issues and challenges for business and FinTech applications | 2022.08 | Paper Link

📖BibTeX

@article{zhang2025llms,
  title={When llms meet cybersecurity: A systematic literature review},
  author={Zhang, Jie and Bu, Haoyu and Wen, Hui and Liu, Yongji and Fei, Haiqiang and Xi, Rongrong and Li, Lun and Yang, Yun and Zhu, Hongsong and Meng, Dan},
  journal={Cybersecurity},
  volume={8},
  number={1},
  pages={1--41},
  year={2025},
  publisher={SpringerOpen}
}

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
figs		figs
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

When LLMs Meet Cybersecurity: A Systematic Literature Review

🔥 Updates

🌈 Introduction

🚩 Features

🌟 Literatures

RQ1: How to construct cybersecurity-oriented domain LLMs?

Cybersecurity Evaluation Benchmarks

Fine-tuned Domain LLMs for Cybersecurity

RQ2: What are the potential applications of LLMs in cybersecurity?

Threat Intelligence

FUZZ

Vulnerabilities Detection

Insecure code Generation

Program Repair

Anomaly Detection / Defense

LLM Assisted Attack

Others

RQ3: What are further research directions about the application of LLMs in cybersecurity?

Further Research: Agent4Cybersecurity

📖BibTeX

Star History

About

Uh oh!

Releases

Packages

Contributors 3

tmylla/Awesome-LLM4Cybersecurity

Folders and files

Latest commit

History

Repository files navigation

When LLMs Meet Cybersecurity: A Systematic Literature Review

🔥 Updates

🌈 Introduction

🚩 Features

🌟 Literatures

RQ1: How to construct cybersecurity-oriented domain LLMs?

Cybersecurity Evaluation Benchmarks

Fine-tuned Domain LLMs for Cybersecurity

RQ2: What are the potential applications of LLMs in cybersecurity?

Threat Intelligence

FUZZ

Vulnerabilities Detection

Insecure code Generation

Program Repair

Anomaly Detection / Defense

LLM Assisted Attack

Others

RQ3: What are further research directions about the application of LLMs in cybersecurity?

Further Research: Agent4Cybersecurity

📖BibTeX

Star History

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Packages