Detecting Unsuccessful Students in Cybersecurity Exercises in Two Different Learning Environments

Authors:

Valdemar Švábenský、Kristián Tkáčik、Aubrey Birdwell、Richard Weiss、Ryan S. Baker、Pavel Čeleda、Jan Vykopal、Jens Mache、Ankur Chattopadhyay

Paper:

https://arxiv.org/abs/2408.08531

Introduction

As cyber threats become increasingly complex, the demand for cybersecurity experts has surged. Effective teaching methods, such as hands-on exercises, are essential for training these experts. However, the complexity of cybersecurity exercises often leads to student frustration and impedes learning. This paper aims to develop automated tools to predict when a student is struggling, enabling instructors to provide timely assistance.

Goals and Scope of This Paper

The primary goal is to extract information from student actions in cybersecurity exercises to predict student success or potential risk of poor performance. The study focuses on university-level cybersecurity education but can be adapted to other contexts. The research poses two main questions:
1. How well do different machine learning classifiers predict (un)successful students in cybersecurity exercises?
2. Are the best classifiers in one context also the best in another context when trained using the same methods with a different student population?

Contributions to Research and Practice

The paper makes several contributions:
1. Collection of an original dataset from 313 students across two learning environments.
2. Automatic extraction of two feature sets from the data.
3. Training and evaluation of eight types of binary classification models for predicting student success.
4. Publication of data and code to support reproducibility and replicability.

Related Work in Predictive Models

Hands-on Cybersecurity Education

Previous studies have used student data to achieve various goals, such as clustering students based on performance. However, predicting student performance in cybersecurity exercises is less explored. For instance, Vinlove et al. used a small dataset and a single model to detect at-risk students, achieving 80% accuracy. Deng et al. used a naive Bayes classifier to predict course performance with 90.9% accuracy but evaluated only one classifier.

Other Areas of Computing Education

Research on student success prediction is more extensive in other computing domains. Hellas et al. reviewed 357 articles and noted that the best studies utilized data from multiple contexts and compared multiple methods. However, many studies had limitations such as single student populations and lack of data sharing.

Literature Gaps and Novelty of This Paper

The study addresses several gaps:
1. Focus on cybersecurity, a less-explored area compared to programming education.
2. Application of multi-contextual data from different institutions and semesters.
3. Prediction in smaller time frames, focusing on individual exercises rather than entire courses.
4. Comparison of various methods, evaluating eight models with a rich feature set.
5. Sharing of research artifacts, including data and code.

Research Methods

Format and Content of the Cybersecurity Exercises

The study involves two learning environments: KYPO CRP and EDURange. KYPO CRP exercises involve breaching vulnerable emulated hosts using a Kali Linux VM, with tasks presented via a web interface. EDURange includes attack and defense exercises, with tasks completed using Linux command-line tools in a VM.

Data Collection in the Two Learning Platforms

KYPO CRP collects command logs and event logs, while EDURange uses a single logging format for terminal data. The study collected data from 244 students in KYPO CRP and 69 students in EDURange.

Data Cleaning and Filtering

The dataset underwent thorough manual and automated inspection to remove unreasonable values. The final dataset includes 21,659 command logs and 8,690 event logs from KYPO CRP, and 4,762 command logs from EDURange.

Definition of Class Labels and Data Labeling

Exercise success is defined as at least 50% completion. In KYPO CRP, this means not displaying solutions for more than 50% of tasks and submitting correct answers. In EDURange, it means finishing at least half of the tasks. The dataset is imbalanced towards successful students, reflecting the settings in which the platforms were employed.

Feature Extraction and Selection

For KYPO CRP, 25 features were engineered, while EDURange had 15 features. Features were derived from exercise problem-solving data, excluding personal information. Automated feature selection was applied using L1-regularized linear models.

Model Training and Evaluation

Eight classifiers were systematically compared: logistic regression, naive Bayes, support vector machines (linear and RBF kernel), K-nearest neighbors, decision tree, Random Forest, and XGBoost. Nested student-level cross-validation was used for model training and evaluation.

Results and Discussion

Classifier Performance in KYPO CRP

The decision tree achieved the highest balanced accuracy (88.4%) and sensitivity (86.9%). Random Forest had the best specificity (96.1%) and AUC (93.1%). The difference in balanced accuracy across models was small, suggesting that all classifiers were suitable for the context.

Comparison With EDURange

Results in EDURange were consistent with KYPO CRP. The decision tree again had the highest balanced accuracy (82%) and sensitivity (90%). Random Forest performed well with a balanced accuracy of 78.6% and AUC of 85.3%. The slightly worse performance in EDURange indicates that additional web interface data improves predictive power.

Limitations and Threats to Validity

Internal Validity: The threshold for success is arbitrary, affecting results. However, this is a limitation of any binary classification.
External Validity: Exercise sessions varied in aspects such as student demographics and modality. Despite differences, these variations may enhance generalizability.

Implications for Teaching Practice

The classifiers can be deployed to detect unsuccessful students, enabling instructors to provide timely assistance. Even with some misclassifications, the rate of false positives/negatives is manageable in hands-on cybersecurity courses.

Conclusions

Identifying at-risk students is crucial for providing targeted interventions. This study demonstrated that predicting student success based on exercise data is a promising approach that generalizes across contexts.

Open Research Challenges

Future work can explore predicting specific tasks where students will struggle, ranking students based on the likelihood of needing help, and detecting at-risk students as early as possible.

Publicly Available Supplementary Materials

The datasets and scripts for processing exercise logs, extracting features, and training classifiers are publicly available.

Illustrations

Figure 1: Overview of the study design.

Figure 2: Distribution of positive and negative training labels.

Figure 3: Features used for building models from KYPO CRP data.

Figure 4: Features used for building models from EDURange data.

Figure 5: Classifier performance using the 244 data points for KYPO CRP.

Figure 6: Classifier performance using the 69 data points for EDURange.

What's Hot

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

Detecting Unsuccessful Students in Cybersecurity Exercises in Two Different Learning Environments

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

Our Picks

AAAI.2024 – Humans and AI

How Diffusion Models Learn to Factorize and Compose

Temporal Fairness in Decision Making Problems

Subscribe to Updates

What's Hot

Detecting Unsuccessful Students in Cybersecurity Exercises in Two Different Learning Environments

Authors:

Paper:

Introduction

Goals and Scope of This Paper

Contributions to Research and Practice

Related Work in Predictive Models

Hands-on Cybersecurity Education

Other Areas of Computing Education

Literature Gaps and Novelty of This Paper

Research Methods

Format and Content of the Cybersecurity Exercises

Data Collection in the Two Learning Platforms

Data Cleaning and Filtering

Definition of Class Labels and Data Labeling

Feature Extraction and Selection

Model Training and Evaluation

Results and Discussion

Classifier Performance in KYPO CRP

Comparison With EDURange

Limitations and Threats to Validity

Implications for Teaching Practice

Conclusions

Open Research Challenges

Publicly Available Supplementary Materials

Illustrations

Related Posts