Efficient Parallel Skyline Query Processing using MapReduce

Project Code: 25P4U22

Abstract

This research explores the efficient processing of skyline queries using the MapReduce framework. Skyline queries, which retrieve the set of non-dominated data points, are computationally expensive for large datasets. This work addresses this challenge by proposing a parallel algorithm leveraging the distributed nature of MapReduce to significantly reduce query processing time. The results demonstrate a substantial improvement in performance compared to traditional sequential methods, making skyline query processing feasible for massive datasets commonly encountered in big data applications. The proposed system is scalable and adaptable to different data distributions.

Introduction

Skyline queries are crucial in many applications, including database systems, decision support systems, and location-based services. Identifying non-dominated points helps users quickly filter out inferior options. However, the computational complexity of skyline queries grows rapidly with the dataset size, making it challenging to process large-scale datasets efficiently. Existing algorithms often struggle with the I/O bottleneck and lack of parallelism. This research aims to overcome these limitations by utilizing the MapReduce framework, a widely adopted paradigm for processing big data.

Objectives

Develop a MapReduce-based algorithm for efficient skyline query processing.
Evaluate the performance of the proposed algorithm in terms of execution time and scalability.
Compare the performance of the proposed algorithm with existing skyline query algorithms.

Demo Video

Project Information

Domain: Big Data, Data Mining, Parallel Computing

Year: 2025

Technologies: Java, Hadoop, MapReduce, HDFS

Platform: Linux, Hadoop Cluster

Computer Science

Electronics

Efficient Parallel Skyline Query Processing using MapReduce

Abstract

Introduction

Objectives

Demo Video