Big data shows itself in the streets where automobiles speed in an endless stream, guiding us to escape congestion quickly; big data is imbedded in street cameras, making urban governance more scientific; big data also exists in machine logs of ultra-large-scale data centers, enabling the immediate identification of faults and elimination of risks by mining massive logs...Data permeates everywhere. It is necessary for enterprises to process data as quickly as possible and identify value from it in view of the enormous data amount and complicated data types.
A lot of new data is generated every day. We should not only save them on hard drives, but also carry out efficient processing to figure out the value reflected by the data, thereby ultimately facilitating our production and life. Cost waste, and even more problems, will be caused if the data is not well leveraged.
With Spark for computing and HDFS as the unified storage basis, a three-level cache solution consisting of “memory +NVMe SSD+HDD” is adopted. A unified data access, processing and analysis platform is provided with the Spark RDD in-memory computing as the core.
During the deployment practice, the NVMe SSD and mechanical hard drive are managed in a unified manner through the acceleration engine, to be redeployed and used by distributed software.
The introduction of the NVMe SSD layer can effectively store small random read/write data blocks, improving the random I/O capability of the cluster by several folds.
Tests show that if NVMe SSD is used as the log disk of the mechanical hard drive OSD, the overall I/O performance of the server will be improved by up to 6 times compared with a pure mechanical hard drive cluster that does not use any log disk.
The new IT system meets the requirements for both performance and capacity by deploying one or two PBlaze Series NVMe SSDs at each storage node on the basis of maintaining the architecture and application of the current ES logging system.
The storage capacity dilemma of the ES system is addressed.
A single server node can write 200,000 ES documents per second and the query in several seconds is provided.
In order to extract information from videos with EB-level storage, it is necessary to conduct structured extraction of the information of people and vehicles in each camera image to the maximum. By utilizing Hadoop HDFS + Spark cluster storage and analyzing relevant structured data, and using PBlaze Series NVMe SSD to cache structured data indexes, the capacity, performance, reliability, and maintainability are significantly elevated.
The customer can deal with the data processing and analysis of the system more efficiently with the patent server, and the query and analysis abilities of the overall solution have been improved by multiple folds.
SATA SSD of the original server has multiple risks. NVMe SSD not only enjoys sound data protection technology, but also simplifies the system architecture, thereby making the overall solution more reliable and easier to maintain.