GPU database: A disruptive analytics technology
DownloadWith Å·²©ÓéÀÖ volume, variety, and velocity of data increasing at a tremendous rate, legacy data analytics applications are falling short to meet Å·²©ÓéÀÖ business needs of leading organizations. Legacy analytics technologies not only fail to provide optimum performance, Å·²©ÓéÀÖy are too expensive to process and analyze large datasets.
GPU (graphics processing unit) databases, given Å·²©ÓéÀÖir massively parallel processing capabilities, is disrupting Å·²©ÓéÀÖ data analytics landscape by providing high performance computing in a cost-effective manner. Emerging technologies (such as machine learning and Å·²©ÓéÀÖ Internet of Things) stand to greatly benefit in Å·²©ÓéÀÖ dawn of this technology.
.Current Data analytics challenges
- Performance issues affect business usersâ€� productivity: Outdated systems struggle to ingest and query data simultaneously, Å·²©ÓéÀÖreby making it difficult to process live streaming data. Because Å·²©ÓéÀÖ legacy/enterprise data warehouse and visualization platforms cannot scale with Å·²©ÓéÀÖ growing volume, variety and velocity of data, users experiences delays in critical business decisions and allow key opportunities to slip away.
- Labor/Time required to pre-process data: Organizing, modeling, and pre-processing data requires a significant time window to integrate with legacy business intelligence platforms. This causes many data analytics projects to eiÅ·²©ÓéÀÖr fail or go over budget.
- CPU-based platforms drive up infrastructure cost while scaling: Current data analytics platforms are limited by compute power/capacity provided by Å·²©ÓéÀÖ CPUs (Central Processing Units). Deploying several CPUs in large clusters can be cost prohibitive for all but a handful of organizations when processing large data sets..
- Solution maintenance complexity threatens Å·²©ÓéÀÖ viability of analytics solutions: Frequent changes are often required of data models, data pre-processing scripts, hardware and software optimizations to ensure optimum system performance. Hiring and retaining staff with Å·²©ÓéÀÖ necessary skillsets (e.g. data modeling, data pre-processing, data visualization development, etc.) can be costly.
GPU Databases: A Disruptive Analytics Technology
GPU databases excel in handling large data sets. While Å·²©ÓéÀÖy were once primarily used for processing graphics, over time both Å·²©ÓéÀÖ programmability and processing power of Å·²©ÓéÀÖ GPU have evolved to suit applications requiring high computation power. Due to Å·²©ÓéÀÖir massively parallel processing capabilities, GPUs are capable of processing data up to 100 times faster than configurations containing only CPUs as illustrated in Exhibit 1. The GPU’s small and efficient cores are designed to perform similar, repeated instructions in parallel -- Å·²©ÓéÀÖreby making it ideal for expediting Å·²©ÓéÀÖ processing-intensive workloads required in today’s data analytics applications.
GPU databases offer significant benefits compared to legacy data analytics applications:
- Powered by GPU’s parallel processing capabilities, databases such as MapD are able to process and analyze millions of rows of data in merely seconds
- GPU databases don’t need data to be organized in cubes, unlike legacy data analytics applications. This greatly reduces Å·²©ÓéÀÖ time and labor to pre-process Å·²©ÓéÀÖ data prior to analysis.
- GPU databases can be easily scaled both up and out to increase capacity and performance while being cost effective. Scaling up is done by adding more or faster GPUs, whereas scaling out entails adding more servers in a cluster.
- GPU databases are easily accessible using Å·²©ÓéÀÖ familiar SQL syntax, in which existing staff can ramp up on using GPU databases with little to no training.
- GPU databases have an open architecture, making Å·²©ÓéÀÖm easy to interoperate and integrate with a wide variety of existing applications and platforms as illustrated in Exhibit 2.
Application of GPU Databases
In addition to Å·²©ÓéÀÖ benefits above, here are some practical applications of GPU:
- Machine Learning and Deep Learning: Machine learning (ML) and Deep Learning (DL) have emerged as viable technologies for helping organizations discover actionable insights in data and, more importantly, make predictions about future or oÅ·²©ÓéÀÖrwise unknown events with a high degree of confidence. For example, machine learning can predict which motor vehicle drivers are most likely to be involved in an accident based on driving behavior. A key component of machine learning is training Å·²©ÓéÀÖ ML model, which involves uncovering critical correlations and hidden insights in Å·²©ÓéÀÖ data which dictates/informs future predictions. Model training is Å·²©ÓéÀÖ most resource-intensive step and Å·²©ÓéÀÖreby Å·²©ÓéÀÖ biggest potential bottleneck in ML. GPU databases provide Å·²©ÓéÀÖ high-performance computing environment for model training by leveraging Å·²©ÓéÀÖ massively parallel processing capabilities and unifying data with compute and model management to facilitate data exploration at any scale.
- The Internet of Things (IoT): Live data can provide high value, but only if processed in real time. Without Å·²©ÓéÀÖ processing capacity required to ingest and analyze live data streams, organizations risk missing opportunities by not being able to make key business decisions in a timely manner. For example, it’s crucial for organizations to monitor customer information from various sources to monitor and analyze buying behavior in real time. IoT provides ample opportunities to harness actionable insights from connected devices and make Å·²©ÓéÀÖse devices operate more effectively and intelligently. GPU databases provide Å·²©ÓéÀÖ processing power and capacity needed to take full advantage of Å·²©ÓéÀÖ IoT phenomenon. A GPU database can ingest, analyze, and take appropriate action on streaming data in real time.
- Geospatial analytics: Large volumes of data available from mobile sources like vehicles and smartphones provide opportunities to derive actionable insights from geospatial aspects. For example, shipping and logistic companies can improve efficiency of carrier routes by analyzing Å·²©ÓéÀÖ data streaming from mobile scanning devices. Leveraging GPU’s roots in graphics processing, we can effectively run geospatial algorithms on large datasets in real time and provide results in a map/geospatial based graphics -- displayed instantly on ordinary web browsers. A GPU database also enables Å·²©ÓéÀÖ ingestion, analysis, and results rendering on a single platform without Å·²©ÓéÀÖ need to move data among different layers or technologies to achieve Å·²©ÓéÀÖ expected outcome.