Spark is a versatile big data analysis engine characterized by high performance, ease of use, and universality.
The Spark architecture, as shown in the diagram below, is built upon Spark Core and has four main libraries: Spark SQL, Spark Streaming, MLlib, and GraphX. These libraries are designed for various scenarios such as offline ETL (Extract-Transform-Load), online data analysis, stream computing, machine learning, and graph computing, respectively.
The spark-history-server component depends on HDFS. The History Server periodically scans the directory to discover new or updated log files and parses them.
Default configurations can be used during the application installation.
After installation, the application instance details page will display the application access address, allowing for operations such as updates and uninstallations for operational management.