Privacy-Preserving K-means Clustering on Distributed Datasets using Secure Multi-Party Computation

Project Code: 25P4U19

Abstract

This research explores the design and implementation of a privacy-preserving k-means clustering algorithm for large-scale datasets using a MapReduce framework and Secure Multi-Party Computation (MPC). The primary objective is to enable efficient and accurate clustering while ensuring data confidentiality. We leverage MPC techniques to perform computations on encrypted data, mitigating the risk of data breaches. The results demonstrate a viable approach for privacy-preserving clustering on distributed environments, achieving comparable accuracy to traditional k-means with strong privacy guarantees. Future work will focus on optimizing performance and scalability for even larger datasets.

Introduction

K-means clustering is a widely used technique for unsupervised machine learning, finding applications across various domains. However, applying k-means to large datasets often necessitates distributed computing frameworks like MapReduce. Simultaneously, growing concerns about data privacy necessitate the development of methods that allow for computation on sensitive data without revealing its contents. Existing methods either compromise privacy or lack scalability. This research addresses this challenge by proposing a privacy-preserving k-means algorithm leveraging the efficiency of MapReduce and the security of MPC, aiming to achieve both scalability and privacy.

Objectives

Implement a privacy-preserving k-means clustering algorithm using a MapReduce framework.
ntegrate secure multi-party computation (MPC) protocols to protect data privacy during computation.
valuate the performance and accuracy of the proposed algorithm against a baseline k-means implementation..

Demo Video

Project Information

Domain: Cybersecurity, Data Classification

Year: 2025

Technologies: Python, Data Analysis, Case Study Research, Visualization Tools

Platform: Cross-platform (Web-based or Desktop tool)

Computer Science

Electronics

Privacy-Preserving K-means Clustering on Distributed Datasets using Secure Multi-Party Computation

Abstract

Introduction

Objectives

Demo Video