In-Network Impact of E-Commerce Bot Traffic

Introduction

The rapid growth of e-commerce has led to an increased prevalence of automated bots engaging in fraudulent activ- ities. Despite online businesses’ continuous efforts to enhance user experience through the adoption of modern technologies the disruptive presence of malicious bots poses a significant threat to the revenue and overall success of companies. we review the characteristics of bot traffic in real-world e-commerce websites and analyze how these characteristics can bring vulnerability while using in-network caching in the backend. We outline how the presence of bot traffic can lead to higher cache misses of legitimate users. As depicted, regular users typically request popular items, leading to the storage of these items within the in-network caching system. However, malicious bots follow a different pattern, actively soliciting less popular items and attempting to inject counterfeit popular items into the caching system.

Dataset

Gaining access to data for research in e-commerce and related areas like search and recommendation has proven to be a challenging endeavor. We conducted an analysis on several pre-existing datasets such as Twibot-20 [1], bot-net [2], Amazon [3], and Alibaba benchmark generator [4]. However, these datasets did not meet our criteria due to either lacking sufficient annotations or lacking essential query and user information. As a result, we conducted a thorough review, which encompassed an analysis of the characteristics of the web requests, web traffic, workload, bot traffic, and network caches. Then, we formulated a set of guiding principles for the precise generation of a bot traffic dataset. The following are the key principles we adhered to during the dataset creation process:

1- Search queries made by bots are often long search queries and tend to search for not so popular items in the e-commerce search engine.

2- User requests follow a Zipfian distribution. This observation signifies that specific segments of the data experience a higher volume of requests, a pattern that evolves over time.

3- The average time interval between consecutive requests made by bots is five times shorter than normal users.

To create a bot dataset in accordance with our learning guidelines, we utilized a Zipfian request distribution with an alpha value of 0.9, as detailed in the zipfian folder. This folder contains essential information for generating keys based on the Zipfian distribution. Bots primarily send requests for less popular keys, while legitimate users focus on popular items.

Additionally, the interarrival time between requests adheres to a normal distribution. Notably, bots exhibit a shorter interarrival time compared to legitimate users. The necessary details for generating interarrival time are available in the time folder.

License

The files are licensed under MIT License: LICENSE. The text of the license can also be found in the LICENSE file.

Citation

If you use this code, please cite our paper:

Hemmatpour, Masoud, Zheng, Changgang, and Noa Zilberman. "E-Commerce Bot Traffic: In-Network Impact,
Detection, and Mitigation", ICIN 2024.

References

[1] Shangbin Feng, Herun Wan, Ningnan Wang, Jundong Li, and Minnan Luo. Twibot-20: A comprehensive twitter bot detection benchmark. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pages 4485–4494, 2021.

[2] Elaheh Biglar Beigi, Hossein Hadian Jazi, Natalia Stakhanova, and Ali A Ghorbani. Towards effective feature selection in machine learning- based botnet detection approaches. In 2014 IEEE Conference on Communications and Network Security, pages 247–255. IEEE, 2014.

[3] Amazon. Shopping queries dataset: A large-scale esci benchmark for improving product search. https://github.com/amazon-science/esci-data, 2022.

[4] Alibaba. e-commerce search benchmark. https://github.com/alibaba/eCommerceSearchBench, 2020.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
image		image
time		time
zipfian		zipfian
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

In-Network Impact of E-Commerce Bot Traffic

Introduction

Dataset

License

Citation

References

About

Uh oh!

Releases

Packages

Languages

License

mashemat/inet_bot_traffic_detection

Folders and files

Latest commit

History

Repository files navigation

In-Network Impact of E-Commerce Bot Traffic

Introduction

Dataset

License

Citation

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages