Run Spark jobs faster and with a fraction of the spending.

Xonai accelerates Spark jobs in your existing cloud data platform or private cloud environment. Activate today without code changes or migrations.

Up to

8

0

%

Reduced cloud costs for

Reduced server costs

Taking effect immediately on solution activation
Multi-cloud
No code changes
No platform migrations
No access to data
As seen in

No application code changes required to deliver up to 80% job time reduction for enterprise-grade Spark workloads in your platform of choice

Spark SQL
DataFrame
Parquet
UDF
Platform
Amazon EMR
AWS
Azure
GCP
Google Dataproc
Private Cloud
Engine
Powered by bleeding-edge compiler infrastructure
Hardware
Spark API compatible
Info
Hadoop & Kubernetes compatible
Info
Runtime compatibility
Info
Spark SQL acceleration
Info
Caching  acceleration
Info
Faster Parquet reads
Databricks, EMR, OSS
up to 5X faster
up to 6X faster
Apache Spark

Spark:

compressed

uncompressed

XONAI:

lz4

zstd

uncompressed

Cache speedup factor

Cache storage reduction factor

spark-submit
 --conf spark.executor.memoryOverhead=1000m
 --conf spark.executor.memory=3000m
 ...
spark-submit
 --jars xonai-spark-plugin.jar
 --conf spark.plugins=com.xonai.spark.SQLPlugin
 --conf spark.executor.memoryOverhead=3000m
 --conf spark.executor.memory=1000m
 ...
Application Run Time
Spark
1 hour
XONAI
12 min
Up to 80% job time reduction

Coming soon!

Xonai Dashboard

An open-source Grafana-based application to assist Big Data infrastructure optimization initiatives where Spark applications are a dominant cost driver.

The Xonai Dashboard aggregates Spark execution metrics and spending estimates for entire clusters and down to each individual application, and with the goal of exposing optimization opportunities.

Gain detailed visibility over cost and performance metrics of EMR clusters.

Understand how XONAI reduces Spark job costs and improves resource utilization.

See detailed execution and performance metrics unlocked by our engine for Spark applications.

Gain detailed visibility over cost and performance metrics of EMR clusters.

Understand how XONAI reduces Spark job costs and improves resource utilization.

See detailed execution and performance metrics unlocked by our engine for Spark applications.

XONAI for Apache Spark

Frequently Asked Questions

Our solution integrates with the open-source Apache Spark 3 distribution and the following data platforms:

- Amazon EMR up to 6.12.0

- Databricks up to 15.4 LTS

- Dataproc 2.0.X, 2.1.X and 2.2.X release line of versions

Note that the Xonai Accelerator is frequently being updated to support new Spark versions.

The solution is activated by a Spark 3 plugin which runs physical plans equivalent to the ones selected by Spark runtimes. In practice, the spark-submit command will point to a JAR provided by us via spark.plugins property.

Additionally, our engine requires moving a fraction of the spark.executor.memory to the spark.executor.memoryOverhead setting. While this change is currently needed as the Xonai engine allocates off-heap memory to process data rather than JVM memory, it will not be necessary in future releases as both engines will share a unified memory architecture.

Existing solutions tackle cloud spending reduction by improving resource provisioning and/or tuning application parameters, and may have only a one-time benefit only for workloads not being optimally deployed.

Our solution accelerates Spark data processing speed far beyond the default Spark engine (Catalyst), and delivers seamless hardware acceleration and reduced resource utilization regardless of how optimally deployed Spark workloads already are.

No. We intentionally designed our engine to be API-compatible with existing runtimes for Spark, including proprietary ones that may modify query plans to improve performance, such as the Databricks and EMR runtime.

As Spark is an in-memory compute engine, the more time queries spend on doing physical computations between reads and writes, the more benefit they are expected to get. These are typically high compute data transformation jobs with heavy aggregations, joins and sorting stages.

A drop-in solution that can be activated in your cloud environment with no code changes to reduce cloud costs and accelerate insight delivery.

Reduce Spark cloud costs today