Authors:
Lyberius Ennio F. Taruc、Arvin R. De La Cruz
Paper:
https://arxiv.org/abs/2408.08694
Introduction
Extracurricular activities are essential in enriching students’ educational experiences by providing opportunities for practical and reflective learning, as well as personal growth. These activities, organized by campus clubs, focus groups, and student organizations, offer students avenues to develop leadership skills, build social connections, and explore interests outside the classroom. This study aims to develop a machine learning workflow that quantifies the effectiveness of student-organized activities based on student emotional responses using sentiment analysis.
Research Background
The Value of Extracurricular Activities and the Role of Student Affairs
Extracurricular activities, though not part of the regular school curriculum, are valuable as they extend students’ academic and non-academic experiences. They help develop important soft skills such as teamwork, leadership, and time management, which are highly sought after by employers. Student affairs offices typically oversee these activities, ensuring they provide recreation, instruction, and exercise without awarding academic credit.
The Problem and its Background
While student activities are integral to educational programs, their effectiveness is often not explicitly quantified. Existing evaluation methods, such as the n-point Likert scale, provide valuable insights but may not offer a comprehensive evaluation. A machine learning-based workflow can automate data collection and analysis, providing a more efficient and unbiased evaluation of activity success.
On Sentiment Analysis with Machine Learning and Python
Sentiment Analysis (SA) involves analyzing written feedback to determine the sentiment or opinion of the respondent. It provides more insight by analyzing open-ended comments and classifying feedback as positive, negative, or neutral. The Python programming language, with its various machine learning libraries, has become a popular choice for sentiment analysis due to its simplicity and versatility.
Problem Statement
Given the value of student activities and the gap in evaluating them effectively, the research problem is: “What machine learning workflow incorporating sentiment analysis can be developed to quantify the effectiveness of student-organized activities based on student emotional responses?”
Research Overview
Research Objective
The main objective is to develop a machine learning workflow that quantifies the effectiveness of student activities using sentiment analysis. Specific objectives include identifying the language model for sentiment analysis, key features for generating sentiment scores, designing a workflow for generating an Event Score, and testing the workflow using sample data.
Significance of the Study
The study can help students and organizations understand the effectiveness of their activities, improve future activities, and provide a practical example of applying NLP to real-world scenarios. It also contributes to the growing body of knowledge on the practical applications of NLP in different contexts.
Methodology
The research involves a literature review and the development of a machine learning workflow using pre-trained language models. The workflow includes data collection, preprocessing, model instantiation, feature score aggregation, and Event Score calculation.
Sample Population and Data Set Characteristics
A sample dataset from Organization C, a Recognized Student Organization (RSO) of College X in Manila, Philippines, was used. The dataset consists of individual Post-Activity reports covering twenty-one events for one academic year.
Research Scope and Limitations
The study is limited to analyzing Post-Activity reports from Organization C for the academic year SY 2022-2023. The data set consists of twenty-one events, and any personally identifiable information irrelevant to the study is removed.
A Review of Related Literature
Applications of Natural Language Processing and Sentiment Analysis
NLP applications have expanded significantly, with uses ranging from chatbots and language translation to governance and policymaking. Sentiment Analysis (SA) focuses on analyzing text data to determine the sentiment or emotion expressed within the text, providing insights into attitudes towards products, services, and more.
Sentiment Analysis in Student Extracurricular Activities and Academia
Several studies suggest that SA can improve the quality of student experiences. For example, automated systems can detect students’ sentiments in social media posts, and sentiment analysis can assess the effectiveness of learning environments and improve overall learning experiences.
Pysentimiento Overview and the BERT Large Language Model (LLM)
Pysentimiento is an NLP toolkit in Python designed for opinion mining and social NLP tasks. The study uses the Bidirectional Encoder Representations from Transformers (BERT) LLM, called via the pysentimiento toolkit, as a Transformer pipeline in Hugging Face. BERT is a transformer-based pre-trained language model that has significantly improved performance on various NLP tasks.
Workflow Overview, and Data & Results Overview
Features Selection
The key features used in the study are Problems Encountered (P), Recommendations (R), and Conclusion (C). These features are passed to the sentiment analysis model to generate sentiment scores.
Model Instantiation using the Hugging Face Transformer Pipeline
The BERT LLM is invoked via the Hugging Face Transformer pipeline() function from the pysentimiento library. The model produces polarity and score values for each feature, which are then aggregated to form an Event Score.
Feature Score Aggregation and Event Score Generation
The Event Score formula considers the weighted average of the three key features, with Problems Encountered (P) given a lower weight due to its generally negative sentiment. The scores are normalized and combined to generate the final Event Score.
Workflow Overview
The workflow involves data extraction and consolidation, feature extraction using Python’s data science libraries, sentiment analysis using the Hugging Face Transformer library, score normalization, and Event Score calculation.
Data, Workflow Results, and General Findings
After processing the dataset through the workflow, the following table is created:
Interestingly, a high P-value does not necessarily lead to a low Event Score. For example, the “National CpE Challenge” has a satisfactory Event Score despite encountering some problems, likely due to its strengths in other areas.
Conclusion
The study successfully developed a machine learning workflow that quantifies the effectiveness of student activities using sentiment analysis. The BERT LLM proved effective in analyzing sentiment beyond product reviews and post comments. This methodical approach provides a comprehensive understanding of student activities’ effectiveness, offering valuable insights for improving future activities. Future research could explore additional NLP-based use cases and analyze a wider range of data for even more comprehensive results.