Cloud Computing Research for Big Data Analytics
Cloud Computing as a disruptive technology, provides a dynamic, elastic and promising computing climate to tackle the challenges of big data analytics. In the Cloud Computing Lab led by Dr. Lei Huang at Prairie View A&M University (PVAMU) northwest of Houston, part of the Texas A&M University System, is conducting research to build a scalable Cloud Computing PaaS (Platform as a Service) for big data processing and analytics. The big data analytics Cloud is built on top of Apache Spark and Hadoop with capability of storing and processing large amount of data. We have also created multiple big data processing templates to hide the parallelism complexity from users, and provided a high-level templates-based programming environment; building, running and monitoring jobs based on the Spark engine; and embedding sophisticated data analytics algorithms based on image processing, signal processing, statistics and machine learning packages. All of these Cloud services are delivered to users via a user-friendly web interface to allow anytime, anywhere and any-device access. This work is built on top of NSF-sponsored Image Processing Cloud project led by PVAMU, and collaborated with University of Houston (UH) and University of Delaware (UD).
The work is to build an innovative domain-specific cloud to accelerate scientific research and discoveries in big data processing domains, including image processing, seismic data processing, and data analytics. This ongoing research contains the following software components
- a web-based high-level big data processing development environment,
- abstract programming templates based on data processing patterns,
- a cloud-based compiler and code generator,
- a parallel execution engine based on Spark,
- a workflow editor and execution engine,
- and a centralized resource management system.
The infrastructure allows researchers/engineers to store their big domain data, facilitates the data processing algorithm research by providing high productive development environment and scalable performance. More importantly, the infrastructure allows faculty and students in different groups/institutions to share their research results and enable deep collaborations.