Big Data Processing on Cloud Computing Architectures for Hyperspectral Remote Sensing

doi:10.1201/9781315159331-4

Chapter

Big Data Processing on Cloud Computing Architectures for Hyperspectral Remote Sensing

ABSTRACT

Hyperspectral images comprise hundreds of contiguous spectral bands, thus imposing significant requirements in terms of storage and data processing. The availability of new hyperspectral missions producing large amounts of data on a daily basis has posed important challenges for scalable and efficient processing of big hyperspectral data in the context of different applications. Cloud computing technologies are in high demand for big hyperspectral remote sensing data processing due to its advanced capabilities for internet-scale, service-oriented, and high-performance computing. They offer the potential to tackle massive data processing workloads by means of distributed parallel architecture.

In this chapter, we first introduce the basic concepts and advances of cloud computing, as well as the fundamentals of cloud computing technologies. After that, we present a parallel and distributed framework for massive hyperspectral data processing based on cloud computing architectures. Specifically, we use spatial correlation regularized sparse representation classification (SCSRC) as a case study to demonstrate the applicability and efficiency of utilizing cloud computing technologies to efficiently perform distributed parallel processing of big hyperspectral data and accelerate hyperspectral data computations. To that end, we develop a parallel and distributed implementation of the SCSRC algorithm in a cloud environment, using advanced technologies such as Hadoop's distributed file system (HDFS) and Apache Spark, as well as a map-reduce methodology. The efficiency of our implementation is evaluated in terms of accuracy and parallel execution performance. Some future research lines of big hyperspectral remote sensing data processing on cloud computing architectures are discussed, such as task schedule strategy, load balancing mechanisms, and so on.