Revisiting Reciprocal Recommender Systems: Metrics, Formulation, and Method

Authors:

Chen Yang、Sunhao Dai、Yupeng Hou、Wayne Xin Zhao、Jun Xu、Yang Song、Hengshu Zhu

Paper:

Introduction

Reciprocal Recommender Systems (RRS) have become increasingly significant in enhancing matching efficiency in various domains such as online dating, recruitment, and social networks. Unlike conventional recommender systems that focus on user-item interactions, RRS involves bilateral recommendations between two parties, each with unique preferences and requirements. Despite the growing interest in RRS, existing evaluation methods often reuse conventional ranking metrics, assessing each side of the recommendation process independently. This approach overlooks the collective influence of both sides on the effectiveness of the RRS, necessitating a more holistic evaluation and systemic solution.

In this study, we revisit the task of reciprocal recommendation by introducing new metrics, formulation, and methods. We propose five new evaluation metrics that comprehensively assess the performance of RRS from three distinct perspectives: overall coverage, bilateral stability, and balanced ranking. Additionally, we formulate RRS from a causal perspective, developing a model-agnostic causal reciprocal recommendation method that considers the causal effects of recommendations. We also introduce a reranking strategy to maximize matching outcomes, as measured by the proposed metrics. Extensive experiments on real-world datasets from recruitment and dating scenarios demonstrate the effectiveness of our proposed metrics and approach.

Related Work

Reciprocal Recommender Systems

Reciprocal Recommender Systems (RRS) have been widely studied for two-sided markets, recommending users to other users rather than items. RRS are frequently employed in domains where both sides of the market have their preferences and interactions, such as online dating, recruitment, and social networks. Most RRS evaluate their methods individually for each task, using widely used top-K recommendation metrics such as Recall, Precision, and Normalized Discounted Cumulative Gain (NDCG). However, these two tasks are two sides of a coin, and the final result is determined by both sides of the market. Studies have been conducted that build upon the economic and social science literature, referring to this scenario as a matching market. However, these methods heavily rely on the ground truth of relevance, often unavailable in real-world datasets.

Causal Inference in Recommendation

Recently, causal inference has gained significant attention in recommender systems research. Current works have explored the application of causal inference to address various challenges in recommender systems, including bias, fairness, and explainability. Causal inference encompasses two primary directions: causal discovery and causal estimation. Causal discovery aims to learn causal relationships from data, identifying causal graphs that reveal interdependencies among factors. Causal estimation seeks to estimate the treatment effect, especially evaluating how interventions or treatments influence user outcomes in recommender systems. Inspired by advances in causal effect estimation, our work concentrates on applying this approach to the domain of reciprocal recommendation, estimating the causal effects from both sides of RRS to enhance recommendation quality.

Research Methodology

Problem Definition

In reciprocal recommender systems (RRS), there are two involved parties (typically two sets of users) with mutual selection relationships. The primary goal is to generate personalized recommendations for users of both sides, aiming to maximize the overall count of matching pairs. Existing works typically assess the ranking performance of each side independently, but in a reciprocal scenario, the primary goal should be the overall count of matching pairs. This work focuses on evaluating the recommendations from both sides from an overall system perspective, including how many actual match pairs are covered by the recommendations (coverage) and whether they are recommended to each other (stability).

Proposed Metrics

To ensure a more holistic and robust evaluation, we propose three new evaluation aspects: overall coverage, bilateral stability, and balanced ranking.

Measures of Overall Coverage

Overall coverage refers to the extent to which a recommender system covers potential matching relationships from an overall perspective. We propose two metrics, Coverage-adjusted Recall (CRecall) and Coverage-adjusted Precision (CPrecision), to refine the positive pairs generated by the RRS by avoiding duplicate counting of redundant successful recommendations.

Measures of Bilateral Stability

Bilateral stability refers to the extent to which a recommender system simultaneously recommends a pair of users to each other. We propose two metrics, Stability-adjusted Recall (SRecall) and Stability-adjusted Precision (SPrecision), to effectively evaluate the bilateral stability of RRS.

Measures of Balanced Ranking

Balanced ranking refers to the ranking performance of the recommendation list ensuring equity across different group sizes. We propose Reciprocal NDCG (RNDCG) based on the widely-used ranking metric NDCG, offering a more comprehensive evaluation of overall ranking performance while considering weight disparities between different sides.

Experimental Design

Experimental Setup

To evaluate the effectiveness of our method, we conduct experiments on two large real-world datasets from different reciprocal recommendation domains: recruitment and online dating. The datasets are preprocessed and split into training, validation, and test sets. We compare our proposed CRRS with various baseline models, including BPRMF, LightGCN, D-BPRMF, LFRR, D-LightGCN, and DPGNN. We employ both traditional metrics (Recall, Precision, NDCG) and the proposed metrics (CRecall, CPrecision, SRecall, SPrecision, RNDCG) to evaluate the performance of the top-K recommendation of RRS.

Implementation Details

We implement baseline models using a popular open-source recommendation library RecBole. For a fair comparison, the dimension of user embedding is standardized to 128 across all models. K is set to 50 empirically. We optimize all methods using the Adam optimizer and conduct a thorough search for the hyper-parameters of all baselines. The learning rate is tuned from {0.001, 0.0001, 0.00001} for optimal performance. To prevent overfitting, we employ early stopping with patience of 30 epochs.

Results and Analysis

Overall Performance

We conduct a comprehensive comparison of different methods across various evaluation metrics. Our proposed CRRS achieves the highest performance in overall coverage metrics (CRecall and CPrecision) and the count of matching pairs. The results indicate that the proposed approach can achieve superior performance on overall performance compared to baseline models.

Metric Comparative Analysis

We conduct experiments to compare traditional metrics and the proposed metrics under different recommendation strategies. The results show that traditional metrics fail to capture the differences among the cases, whereas our proposed metrics (CRecall and SRecall) can capture such distinctions, providing a more comprehensive evaluation of the RRS.

Ablation Study

We analyze the effectiveness of key components of CRRS, including the potential outcome framework training algorithm and the proposed reranking strategy. The results highlight the usefulness of all components in CRRS, with varying impacts across different datasets.

Ranking Analysis of Redundant Recommendations

We analyze the distribution of redundant recommendations in the ranking list. The results suggest that innovative approaches like reranking are essential for achieving more effective and balanced outcomes, as the majority of redundant recommendations are found in the advanced positions.

Overall Conclusion

This paper revisited the study of reciprocal recommender systems by introducing new metrics, formulation, and method. We proposed five metrics for evaluating RRS from three new aspects: overall coverage, bilateral stability, and balanced ranking. We formulated reciprocal recommendation tasks from a causal perspective, considering the recommendations as bilateral interventions. Furthermore, we proposed a causal reciprocal recommendation model using a potential outcome framework and designed a reranking strategy to enhance the overall performance. Extensive experiments on two datasets indicate that the proposed approach can achieve superior performance on overall performance.