The ability to efficiently extract useful information from volumes of data distributed across myriad networks is hindered by the latencies inherent to magnetic storage devices and computer networks. We propose overcoming these limitations by leveraging solid-state drive (SSD) and field-programmable gate array (FPGA) technologies to process large streams of data directly at the storage sites.

Our proposed reconfigurable, active, solid-state drive (RASSD) platform consists of distributed nodes that couple SSDs with FPGAs. While SSDs store data, FPGAs implement processing elements that couple soft-core RISC processors with dynamically reconfigurable logic resources. The processors execute data-processing software drivelets, and the logic resources implement hardware for accelerating performance-critical operations. Executing appropriate drivelets and using matching hardware accelerators enables us to efficiently process streams of data stored across SSDs.

To manage the configuration of RASSD nodes and provide a transparent interface to applications, our platform also consists of distributed middleware software. Client local middleware (CLM) resides on client host machines to interpret application data processing requests, locate storage sites, and exchange data-processing requests and results with middleware servers (MWS). MWS connect to clusters of RASSD nodes and contain libraries of drivelets and accelerator configuration bit streams. An MWS loads appropriate drivelet and accelerator bit streams onto a RASSD node's FPGA, aggregates processed data, and returns it to a CLM.

To evaluate our platform, we implemented a simple system consisting of a host computer connected to a RASSD node over a peer-to-peer network. We ran a keyword search application on the host computer, which also provided middleware functionality. We then evaluated this platform under three configurations. In configuration C1, the RASSD node was only used to store data while all data was processed by the MWS running on the host computer. In configuration C2, the data was processed by a drivelet running on the RASSD node. Finally, in configuration C3, the data was processed both by a drivelet and a hardware accelerator.

Our experimental results show that C3 is 2x faster than C2, and 6x faster than C1. This demonstrates our platform's potential for enhancing the performance of data-intensive applications over current systems.


Article metrics loading...

Loading full text...

Full text loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error