Ultrafast Data Acquisition—UFDAC

Science experiments in the past decade have seen an enormous growth in the amount of data generated. As the repetition rates of advanced light sources increase and their detectors become faster with higher dynamic ranges and frame rates, advanced light sources need to increase the rate of data acquisition to be able to cope with unprecedented amounts of incoming information. Also, the ability to visualize experimental data and to extract preliminary results is of highest importance to the scientists using these facilities, as both capabilities enable them to perform more successful, efficient experiments.

Meeting the challenge of ultrafast processing of big data

Processing, storing, and analyzing data is no trivial matter in any case, let alone at advanced light sources. Individual instruments at advanced light sources can produce data at the level of hundreds of gigabytes per second, meaning even terabyte storage systems would fill up in minutes. This data amount is mostly generated by complex detectors or diagnostic systems needed for carrying out experiments, and each provides data rates on the order of gigabytes per second. Latency, or the time it takes to process and record the data, is also an issue, as it can cause a traffic jam of sorts on data servers, potentially risking the loss of data and affecting results. Scientists need their data to move from detector to storage fast enough to clear the way for the next data set—operations requiring the transfer of massive amounts of information in a fraction of a second. The data need to be recorded in a standardized format that can be read later.

To help ameliorate this issue, UFDAC scientists worked on a suite of algorithm libraries that deliver scientists their data in a highly reliable way at the highest data flow. By coordinating the efforts of data transfer experts and programmers, UFDAC was able to push the boundaries of transfer speeds and online processing.

Image: UFDAC algorithms are pushing data transfer to its limits. UFDAC worked on pushing transfer rates to their theoretical maximums, avoiding unnecessary involvement of computing resources and intermediate copies of data into the host memory as the data moves from detector to processing unit. UFDAC algorithms are able to reach the current physical limits between host memory and processing unit (host to host, shown in blue and orange), no matter which hardware platform is used. The transfer rate between processing units (or detector interface and processing unit—device to device, shown in gold and grey) is still under development. “K20” and “K80” refer to the type of processor card used in this study. (Credit: ELI-ALPS)

UFDAC firmware is a general solution to the data transfer problem that works at as many facilities as possible. UFDAC used a framework called RASHPA, developed at ESRF through the earlier EU-funded project CRISP, as a basis for new data pipelines that decrease processing time. RASHPA employs highspeed data connections inside computers, and UFDAC scientists experimented with longer physical connections extending these high-speed links between the detector and the processing unit. With the improved data pipelines, transfer using UFDAC algorithms can reach as much as 10 or 20 GB per second per individual data source, in a highly generic and easy-to-use way as compared to commonly available solutions. This helps with online processing and keeps data moving from detector to data server at a more continuous rate—something critical as detectors continue to record more complex data more quickly.

Online processing solutions

Data needs to be manipulated as soon as it is acquired by a detector, which is known as online processing. To help online processing become more efficient, scientists had to learn how to deal with different processors using FPGAs and GPUs. UFDAC algorithms can adapt to environments regardless of the processor type, in part through a program called Alpaka, which automatically fits a data acquisition setup to a standardized software solution. The algorithms temporarily reorganize the data so calculations can be performed in an efficient parallel manner, and a later algorithm returns the data to its original structure for recording. Online processing also increases the efficiency of advanced light sources. Since information is processed immediately, scientists can get real-time pictures of what incoming X-rays look like or how their samples are interacting with the light. For example, EUCALL’s PUCCA group developed a timing tool that allows scientists to monitor the difference in time between two incoming pulses of light down to a millionth of a billionth of a second. Instead of learning only after the experiment that settings needed adjustment, they can now adjust the timing on the fly, thanks to UFDAC algorithms—giving them the chance to get better data during one allocated experiment time, instead of needing to come back to an advanced light source
numerous times to perform a similar experiment to acquire better data.

Resilient methods

UFDAC’s solutions for the data flood also help deal with the unexpected. If an outage caused by technical problems or overload on certain pipelines occurs, then the frameworks developed for UFDAC can respond by making the data flow a bit like water—taking a path of least resistance. By using different, much less occupied pipelines in the system, the data flow can continue allowing the system to function at a similar speed without risking losing valuable information. This resiliency is meant to boost performance for high-quantity data acquisition.

 

Image: UFDAC’s algorithms are resilient, allowing data
transfer to occur without interruption even as the
number of channels, or nodes, through which
information can flow become unavailable. This test
allowed data to be transferred at a rate between
6 and 7 GB per second, even as the number of nodes
dropped to just one. The transfer rate did not drop off
until the last node shut down. (Credit: HZDR)

UFDAC Deliverables

Deliverable 5.1 - Report on online 2D image processing (Report about results of online 2D Image processing Task) / Submitted September 2018

 

Deliverable 5.2 - Report on high-speed data transfer and data injection (Report on results of high-speed data transfer and data injection task) / Submitted September 2018

 

Deliverable 5.3 - Report on online processing of digitizer data (Report on results of online processing of digitizer data task) / Submitted September 2018