Cloudera today unveiled the industry's first real-time query engine for Hadoop. This major evolution to the company's Platform for Big Data, Cloudera Enterprise, makes this the first Big Data management solution that allows batch and real-time operations to be performed on any type of data -- unstructured and structured -- within one massively scalable system.
This new approach of creating a single, centralized Big Data platform dramatically improves the economics and performance of large scale data management in the enterprise. For the first time, organizations can process data at petabyte scale and, on the same system, interact with that data in real time to deliver "speed-of-thought" insights.
"Mainstream enterprise adoption of Hadoop will inevitably raise expectations," said Tony Baer, Principal Analyst for Ovum. "Enterprises have grown accustomed to interactive querying and on-the-spot analytics with their existing data warehousing and BI infrastructures and will expect no less of Hadoop. With a real-time query capability powered by its new Impala engine, Cloudera is striving to level the playing field in performance and accessibility with massively parallel SQL platforms."
This latest innovation from Cloudera, which takes the market-leading Apache Hadoop platform decisively "beyond batch," is Cloudera Impala, an Apache-licensed, real-time query engine for data stored in HDFS (Hadoop Distributed File System) and HBase, resulting from two years of in-house development. Cloudera Enterprise RTQ (Real-time Query) provides the management and support capabilities needed to effectively operate Cloudera Impala in production environments. Cloudera partners, including Capgemini Financial Services, Karmasphere, MicroStrategy, Pentaho, Qlikview, and Tableau, have already validated their solutions with Cloudera Enterprise RTQ powered by Impala.
Apache Hadoop started as an offline, batch processing system. Subsequently, Hadoop was extended to service more interactive online workloads. First among these was HBase, the distributed, tabular data store. Impala, the new open source project for real-time workloads announced by Cloudera today, introduces a scalable, distributed query engine to the Hadoop ecosystem. The technology was developed by Lead Architect of the Impala project, Marcel Kornacker, who previously co-architected the query engine for the F1 project at Google. Cloudera Impala boasts a flexible data model so it can work over more complex data than a data warehouse and is efficient, with interactive queries expressed in industry-standard SQL. It can be used by IT and business analysts across a wide range of data types and data volumes to interact at the speed of thought with data stored in HDFS or HBase.
"Apache Hadoop has already transformed the industry, unlocking value from Big Data for enterprises around the world," said Mike Olson, CEO of Cloudera. "Until now, enterprises had to limit the work they did with Hadoop because batch-mode processing using MapReduce was just too slow for some business problems. With today's release of Cloudera Enterprise Real-Time Query powered by Impala, we solve that problem. Cloudera Impala complements MapReduce and is the latest addition to our one hundred percent open source Big Data platform. You can now store all your data in Hadoop and use the same hardware to do both powerful analytics and run real-time queries using industry-standard tools and the SQL language. This groundbreaking new project delivers the crucial next step in realizing our vision -- to let our customers Ask Bigger Questions of all their data. Cloudera Enterprise with Real-Time Query powered by Impala is a major advance on the Hadoop platform and opens up new possibilities for Big Data in the enterprise."