VeloxCon 2024: Innovation in data management – IBM Blog

VeloxCon 2024: Innovation in data management - IBM Blog

VeloxCon 2024, the premier developer conference that is dedicated to the Velox open-source project, brought together industry leaders, engineers, and enthusiasts to explore the latest advancements and collaborative efforts shaping the future of data management. Hosted by IBM® in partnership with Meta, VeloxCon showcased the latest innovation in Velox including project roadmap, Prestissimo (Presto-on-Velox), Gluten (Spark-on-Velox), hardware acceleration, and much more. An overview of Velox Velox is a unified execution engine that is built and open-sourced by Meta, aimed at accelerating data management systems and streamlining their development. One of the biggest benefits of Velox is that it consolidates and unifies data management systems so you don’t need to keep rewriting the engine. Today Velox is in various stages of integration with several data systems including Presto (Prestissimo), Spark (Gluten), PyTorch (TorchArrow), and Apache Arrow. You can read more about why Velox was built in Meta’s engineering blog. Velox at IBM Presto is the engine for watsonx.data, IBM’s open data lakehouse platform. Over the last year, we’ve been working hard on advancing Velox for Presto – Prestissimo – at IBM. Presto Java workers are being replaced by a C++ process based on Velox. We now have several committers to the Prestissimo project and continue to partner closely with Meta as we work on building Presto 2.0. Some of the key benefits of Prestissimo include: Hugh performance boost: query processing can be done with much smaller clusters No performance cliffs: no Java processes, JVM, or garbage collections, as memory arbitration improves efficiency Easier to build and operate at scale: Velox gives you reusable and extensible primitives across data engines (like Spark) This year, we plan to do even more with Prestissimo including: The Iceberg reader Production readiness (metrics collection with Prometheus) New Velox system implementation TPC-DS benchmark runs VeloxCon 2024 We worked closely with Meta to organize VeloxCon 2024, and it was a fantastic community event. We heard speakers from Meta, IBM, Pinterest, Intel, Microsoft, and others share what they’re working on and their vision for Velox over two dynamic days. Day 1 highlights The conference kicked off with sessions from Meta including Amit Purohit reaffirming Meta’s commitment to open source and community collaboration. Pedro Pedreira, alongside Manos Karpathiotakis and Deblina Gupta, delved into the concept of composability in data management, showcasing Velox’s versatility and its alignment with Arrow. Amit Dutta of Meta explored Prestissimo’s batch efficiency at Meta, shedding light on the advancements made in optimizing data processing workflows. Remus Lazar, VP Data & AI Software at IBM presented Velox’s journey within IBM and vision for its future. Aditi Pandit of IBM followed with insights into Prestissimo’s integration at IBM, highlighting feature enhancements and future plans. The afternoon sessions were equally insightful, with Jimmy Lu of Meta unveiling the latest optimizations and features in Velox. While Binwei Yang of Intel discussed the integration of Velox with the Apache Gluten project, emphasizing its global impact. Engineers from Pinterest and Microsoft shared their experiences of unlocking data query performance by using Velox and Gluten, showcasing tangible performance gains. The day concluded with sessions from Meta on Velox’s memory management by Xiaoxuan Meng and a glimpse into the new simple aggregation function interface that was presented by Wei He. Day 2 highlights The second day began with a keynote from Orri Erling, co-creator of Velox. He shared insights into Velox Wave and Accelerators, showcasing its potential for acceleration. Krishna Maheshwari from NeuroBlade highlighted their collaboration with the Velox community, introducing NeuroBlade’s SPU (SQL Processing Unit) and its transformative impact on Velox’s computational speed and efficiency. Sergei Lewis from Rivos explored the potential of offloading work to accelerators to enhance Velox’s pipeline performance. William Malpica and Amin Aramoon from Voltron Data introduced Theseus, a composable, scalable, distributed data analytics engine, using Velox as a CPU backend. Yoav Helfman from Meta unveiled Nimble, a cutting-edge columnar file format that is designed to enhance data storage and retrieval. Pedro Pedreira and Sridhar Anumandla from Meta elaborated on Velox’s new technical governance model, emphasizing its importance in guiding the project’s development sustainability. The day also featured sessions on Velox’s I/O optimizations by Deepak Majeti from IBM, strategies for safeguarding against Out-Of-Memory (OOM) kills by Vikram Joshi from ComputeAI, and a hands-on demo on debugging Velox applications by Deepak Majeti. What’s next with Velox VeloxCon 2024 was a testament to the vibrant ecosystem surrounding the Velox project, showcasing groundbreaking innovations and fostering collaboration among industry leaders and developers alike. The conference provided attendees with valuable insights, practical knowledge, and networking opportunities, solidifying Velox’s position as a leading open source project in the data management ecosystem. If you’re interested in learning more and joining the Velox community, here are some resources to get started: