Optimizing Drone Deployment for Maximized User Connectivity in Areas of Interest Via Deep Reinforcement Learning

Journal Information

Journal of Network and Systems Management (2025) 33:49 DOI: https://doi.org/10.1007/s10922-025-09924-1

Authors

Kolichala Rajashekar¹
Ashutosh Garg³
Anand M. Baswade¹
Subhajit Sidhanta²

¹Department of Electrical Engineering, IIT Bhilai, Bhilai, India ²Department of Computer Science and Engineering, IIT Bhilai, Bhilai, India ³Department of Computer Science, IIT Bhilai, Bhilai, India

Corresponding Author: Kolichala Rajashekar (kolichalar@iitbhilai.ac.in)

Abstract

In areas with limited communication capabilities, such as disaster zones, network service providers can deploy groups of Unmanned Aerial Vehicles (UAVs), commonly referred to as drones, serving as Drone Base Stations (DBSs) to temporarily supplement traditional communication infrastructure. In these regions, people who have survived the disaster adapt their positions to seek protection as the affected area gradually expands. Therefore, it is necessary to change the relative position of the DBSs to ensure adequate network performance and uninterrupted connectivity for users. This objective involves continuously updating the DBS position, which is subject to a long-term objective. To achieve this objective, we use Deep Reinforcement Learning (DRL) to adaptively modify the position of the DBS in response to the varying location of the User Equipment (UE). To this end, we design a Markov Decision Process (MDP) accounting for the continuous nature of states and actions involved in this problem, which is then solved by a Deep Deterministic Policy Gradient (DDPG) algorithm. The proposed DRL framework optimizes the DBS placement to maximize the number of connected users. Extensive simulation results demonstrate that the proposed DRL approach achieves higher user connectivity and faster adaptation to changes in user distribution compared to conventional methods. The results show a significant improvement in the average number of connected users and the ability of DBSs to cover users within the designated area.

Keywords

Deep Reinforcement Learning (DRL) · Drone Base Station (DBS) · User Equipment (UE) · Unmanned Aerial Vehicle (UAV) · Deep Deterministic Policy Gradient (DDPG) · Markov Decision Process (MDP)

Introduction

The increasing demand for reliable communication in disaster-stricken areas or temporary events highlights the need for flexible and rapidly deployable network infrastructure. Traditional terrestrial base stations are often vulnerable or unavailable in such scenarios. UAVs, or drones, equipped as Drone Base Stations (DBSs), offer a promising solution by providing on-demand wireless connectivity. However, the dynamic movement of users in these areas, particularly in evolving disaster zones, necessitates an adaptive and continuous adjustment of DBS positions to maintain optimal service. This paper addresses the challenge of optimizing DBS deployment in real-time to maximize user connectivity, particularly when user locations are continuously changing. The problem is framed as a long-term optimization task, which is well-suited for Deep Reinforcement Learning (DRL) techniques.

Methodology

The core of our approach involves a Deep Reinforcement Learning (DRL) framework, specifically utilizing the Deep Deterministic Policy Gradient (DDPG) algorithm, to adaptively control the positions of Drone Base Stations (DBSs).

Problem Formulation

System Model: We consider a scenario where multiple DBSs are deployed to serve User Equipment (UEs) within a specified area of interest. The UEs' positions are dynamic, simulating movement in a disaster zone.
Objective: Maximize the total number of connected users over time by optimally adjusting the 3D coordinates of each DBS.
Connectivity Model: User connectivity is determined by factors such as received signal strength, path loss, and line-of-sight (LoS) probability between DBSs and UEs. The minimum required signal-to-interference-plus-noise ratio (SINR) is considered for a successful connection.

Markov Decision Process (MDP) Design

The problem is formulated as an MDP to capture the continuous nature of states and actions:

State Space ($\mathcal{S}$): The state is defined by the 3D coordinates of all DBSs and the current 2D coordinates of all active UEs.
Action Space ($\mathcal{A}$): The action is a vector representing the change in 3D coordinates (position updates) for each DBS. Actions are continuous.
Reward Function ($\mathcal{R}$): The immediate reward at each time step is defined as the total number of connected users. Additional penalties are introduced for DBSs moving out of the defined operational altitude range.
Transition Probability ($\mathcal{P}$): The environment transitions based on the chosen action and the movement model of the UEs (e.g., random waypoint, social-force model).

Deep Reinforcement Learning Algorithm (DDPG)

DDPG is chosen for its ability to handle continuous action spaces:

Actor-Critic Architecture:
- Actor Network: Maps the current state to a continuous action (DBS position adjustments).
- Critic Network: Estimates the Q-value (expected cumulative reward) for a given state-action pair.
Target Networks: Separate target actor and critic networks are used to stabilize training.
Experience Replay Buffer: Stores past (state, action, reward, next state) tuples to break correlations in the training data.
Noise Addition: Exploration is facilitated by adding Ornstein-Uhlenbeck process noise to the actor's actions during training.

Simulation Environment

The proposed framework is evaluated using a custom simulation environment that models:

Dynamic UE movement.
Realistic wireless channel models (LoS and NLoS components).
DBS mobility constraints.
The DRL agent interacts with this environment to learn optimal deployment strategies.

Results

Extensive simulations were conducted to evaluate the performance of the proposed DRL-based DBS deployment strategy. The key findings include:

Improved User Connectivity: The DRL approach consistently achieved a higher average number of connected users compared to conventional methods (e.g., static deployment, greedy algorithms). This indicates its effectiveness in adapting to dynamic user distributions.
Faster Adaptation: The DRL agent demonstrated rapid convergence and adaptation to changes in user location patterns, ensuring continuous service provision even as the affected area evolves.
Optimal Coverage: The results show that DBSs, guided by the DRL agent, were able to intelligently position themselves to maximize coverage within the designated area, minimizing connectivity gaps.
Robustness: The DRL framework exhibited robustness against varying numbers of users and different mobility patterns, maintaining high performance across diverse scenarios.
Comparative Analysis: The proposed DRL method significantly outperformed baseline strategies in terms of user satisfaction and network performance metrics.

Conclusion

This paper presents a novel Deep Reinforcement Learning (DRL) framework for optimizing the real-time deployment of Drone Base Stations (DBSs) to maximize user connectivity in dynamic environments, such as disaster zones. By formulating the problem as a Markov Decision Process (MDP) and employing the Deep Deterministic Policy Gradient (DDPG) algorithm, our approach enables DBSs to adaptively adjust their positions in response to changing User Equipment (UE) locations. Simulation results clearly demonstrate that the proposed DRL-based strategy achieves superior user connectivity and faster adaptation capabilities compared to traditional methods. This work offers a promising solution for enhancing communication resilience and providing reliable network services in challenging scenarios. Future work could explore multi-agent DRL for collaborative DBS deployment and integration with real-world constraints.

References

A selection of key references from the paper:

Al-Hourani, A., Kandeepan, S., Lardner, B.: Optimal LAP altitude for maximum coverage. IEEE Wireless Communications Letters 3(6), 569–572 (2014)
Wang, W., et al.: Deep reinforcement learning for UAV-enabled mobile edge computing. IEEE Internet of Things Journal 6(5), 9037–9049 (2019)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press (2018)
Rafn, A., et al.: Stable-baselines3: reliable reinforcement learning implementations. J. Mach. Learn. Res. 22, 1–8 (2021)

Note: This README is a summary of the provided paper. For full details, please refer to the original publication via the DOI link.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
comm.py		comm.py
env.py		env.py
gaussmobility.py		gaussmobility.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Optimizing Drone Deployment for Maximized User Connectivity in Areas of Interest Via Deep Reinforcement Learning

Journal Information

Authors

Abstract

Keywords

Introduction

Methodology

Problem Formulation

Markov Decision Process (MDP) Design

Deep Reinforcement Learning Algorithm (DDPG)

Simulation Environment

Results

Conclusion

References

About

Uh oh!

Releases

Packages

Languages

rajsh3kar/eddp

Folders and files

Latest commit

History

Repository files navigation

Optimizing Drone Deployment for Maximized User Connectivity in Areas of Interest Via Deep Reinforcement Learning

Journal Information

Authors

Abstract

Keywords

Introduction

Methodology

Problem Formulation

Markov Decision Process (MDP) Design

Deep Reinforcement Learning Algorithm (DDPG)

Simulation Environment

Results

Conclusion

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages