Authors:

Musa TaibJiajun WuSteve DrewGeoffrey G. Messier

Paper:

https://arxiv.org/abs/2408.07845

Introduction

The Housing and Homelessness System of Care (HHSC) aims to connect individuals experiencing homelessness with supportive housing. This system comprises various agencies, each with different information technology platforms, leading to isolated data silos. Larger agencies can train and test artificial intelligence (AI) tools due to their extensive data, but smaller agencies often lack this capability. This paper introduces a Federated Learning (FL) approach to enable all agencies to collaboratively train a predictive model without sharing sensitive data, thereby ensuring equitable access to quality AI tools while preserving privacy.

Background

Emergency housing shelters in North America were initially designed for short-term stays but now often serve individuals for extended periods due to various challenges. The Housing First philosophy emphasizes quickly connecting individuals to supportive housing to improve their chances of exiting the shelter system. However, matching shelter users to supportive housing is complex due to the limited availability of supportive housing spaces and the need to prioritize chronic and episodic shelter users.

Machine learning (ML) can rapidly identify first-time shelter users at risk of becoming chronic or episodic users, using only the initial months of shelter records. The goal is not to automate housing decisions but to assist human staff in identifying at-risk individuals who might be overlooked by other programs.

Challenges in Equitable Access to ML

HHSCs consist of numerous agencies, each maintaining its own administrative database. Predicting homelessness risk is most accurate when training on data capturing interactions across all agencies. However, merging datasets is challenging due to privacy concerns, incompatible IT systems, and the absence of unique identity numbers for individuals across agencies. Smaller agencies, with limited data, struggle to train effective ML models, creating an equity issue in access to high-quality ML tools.

Federated Learning Approach

The proposed solution is a Federated Learning (FL) approach that utilizes disconnected datasets while respecting privacy. This method involves horizontal partitioning, where each agency’s dataset defines a partition. The model is trained using shelter stay data, ensuring generalizability and equity across HHSCs.

Labeling Process

The labeling process uses historical shelter access records to determine the number of stays and episodes for each individual. A k-means clustering algorithm assigns labels (transitional, chronic, episodic) based on these values. The FL approach uses a decentralized k-means algorithm to pool data without merging records, preserving privacy.

Methodology

Problem Formulation

The task is formulated as a multi-class prediction problem, where the goal is to train a classifier to predict whether an individual will become a chronic, episodic, or transitional shelter user based on their shelter stay patterns.

Training Details

Three scenarios are considered:
1. Centralized: All data is merged centrally, and the model is trained on this dataset.
2. Federated: Agencies collaboratively train a single model using FL, avoiding data merging.
3. Isolated: Each agency trains its own model independently.

The FL framework follows the FedAvg algorithm, involving decentralized labeling, local model training, and aggregation of local models into a global model.

Evaluation

Experimental Setup

Experiments were conducted using anonymized shelter data from the Calgary Homeless Foundation, covering 6,840,069 sleep records for 50,455 individuals across 8 shelters. The dataset was preprocessed to create features based on shelter stay patterns and labeled using the decentralized k-means algorithm.

Hyperparameter Selection

Key hyperparameters include the observation window (To), the number of time bins (Tb), and the prediction window (Tp). The best performance was achieved with larger To and Tb values and a smaller Tp value.

Performance Comparison

The Centralized scenario achieved the best performance, but the Federated model performed nearly as well, demonstrating the feasibility of FL in this context. The Isolated scenario performed the worst, particularly for smaller agencies with limited data.

Discussion

Centralized vs. Federated

While the Centralized model outperforms others, it is often impractical due to privacy and logistical challenges. The Federated approach offers a viable compromise, providing good performance without requiring centralized data merging.

Achieving Equity with Federated Learning

FL addresses the disparity in data analytics capabilities between large and small agencies. Smaller agencies benefit significantly from the aggregated insights available through FL, ensuring equitable access to high-quality ML tools.

Reducing the Need for Data Linkage

FL eliminates the need for direct data linkage, reducing privacy concerns and logistical complexities. This approach allows agencies to collaborate on model training without sharing sensitive information.

Conclusion

This study demonstrates the potential of Federated Learning to enhance equitable access to AI tools in the HHSC. By enabling collaborative model training without data sharing, FL addresses privacy concerns and logistical challenges, promoting equity across the system. Future research can build on this framework to develop more robust ML applications in social services.

Share.

Comments are closed.

Exit mobile version