Cloudera Democratizes Apache Hadoop For Enterprise End Users With Open Source, Interactive Search

Press Release | Cloudera | June 4, 2013

Latest Offering From Leading Big Data Vendor Extends the Capabilities of Hadoop, Offering Easy and Familiar Access to Data for Increased Visibility and Quicker Time to Insight

PALO ALTO, CA and SAN FRANCISCO, CA--(Marketwired - Jun 4, 2013) -  From The Economist Information Forum in San Francisco, Cloudera, the leader in enterprise analytic data management powered by Apache Hadoop™, today announced the public beta of Cloudera Search, the industry's first fully integrated search engine for interactive exploration of data stored in the Hadoop Distributed File System (HDFS) and Apache HBase™. The latest in a series of innovations from Cloudera designed to simplify and increase Hadoop's usability by more departments of an organization and powered by the leading open source search engine, Apache Solr™, Cloudera Search enables anyone within an organization to perform interactive, natural language keyword searches and faceted navigation on data stored in Hadoop, without additional training or advanced programming knowledge.

Cloudera Search was developed to address a rapidly emerging need, as enterprises' Hadoop deployments mature and advance to become the primary repositories for more and more kinds of data: how to better and more quickly combine and refine data into a single, integrated platform. At its core, Cloudera Search incorporates Apache Solr and other search-related open source projects to support a comprehensive big data infrastructure, and to alleviate the significant costs of maintaining the disparate systems that many enterprises currently depend on to execute search queries.

The arrival of Cloudera Search provides the enterprise with breakthrough simplicity and exploration capabilities, so users can drill down deeper into data using full-text and faceted search to solve critical business problems in real-time. Cloudera's search solution combines the established, feature-rich, open source search platform of Solr and its extensible APIs for easy integration with production legacy systems, offering valuable integration with CDH that address many of the common pain points of standalone search solutions for Hadoop. Through the new, robust failover features available in SolrCloud (Solr4), Cloudera Search delivers the same feature set of the search platform with more scalable indexing and query serving than was ever previously possible.

Like Cloudera Impala, the industry's first open source, interactive SQL query engine for Hadoop, Cloudera Search extends the reach and capability of Cloudera Enterprise, the definitive Platform for Big Data. Cloudera is now making it possible for enterprises to "unaccept the status quo" imposed by closed source solutions vendors and benefit from the superior economics and unparalleled opportunity of Hadoop as a central, enterprise data platform that addresses the challenges and opportunities presented by big data.

Beyond SQL: Now Everyone Can Benefit from Hadoop
As enterprises increasingly look for ways to derive greater value from all their data, a pervasive challenge has emerged: how to make all data available and consumable beyond IT departments, so it can be more widely leveraged across an entire organization. Cloudera's search solution expands the data exploration capabilities of Hadoop with faceted navigation and full-text search to more quickly find data for processing and analysis. Cloudera Search puts the power of data discovery into the hands of non-technical teams, enabling line of business and everyday users to interact with and uncover relevant correlations from data in a familiar, easy to use search interface. Companies can provide secure access to a centralized data repository and make it accessible to anyone who wants to derive valuable insight and consolidate search and Hadoop cluster investments into one, complete solution with unified management and control through Cloudera Manager.

"Data is one of the most valuable assets we have when it comes to preventative mental and physical healthcare," said Chris Poulin, managing partner of Patterns and Predictions. "With next generation predictive analytics tools powered by Hadoop, healthcare providers can now address healthcare issues proactively and hope to solve even the most intractable challenges, like suicide prevention for military veterans. With the power to correlate medical reports, patient records, care provider notes, and social media data along with other relevant data sources, we can cultivate a deeper, more holistic understanding of patients and disease to support better treatment plans and optimize patient care. By giving non-technical individuals the power to perform real-time search and queries on data stored in Hadoop, Cloudera is providing critical tools to advance healthcare innovation and discovery."

Beyond Batch: Real-Time Interaction with Data in Hadoop
Cloudera Search provides enterprises scalable indexing options for big data and extends the Apache Solr project to offer near real-time document processing and indexing of data in transit to Hadoop and other storage endpoints. Data is immediately available to Search and other Hadoop computing frameworks, like Apache Hive™ and Cloudera Impala. Cloudera Search also provides linearly scalable batch indexing for large data stores within Hadoop on-demand, and with the introduction of an innovative GoLive feature can now incorporate incremental index changes, while avoiding costly downtime.

"We have been leveraging Cloudera Search for OpenStack log exploration with great success. It delivers an open source solution for near real-time operational insights stored in Hadoop, and supports faster analytics and time to insight through applications like Cloudera Impala and other workloads," said Joseph George, director of product strategy in Dell's Revolutionary Solutions Team. "With Cloudera Search, Hadoop has become the master data hub, where search indexes can be easily built on demand, executed, stored and easily managed."

"It's exciting to see Lucene, a project I started 15 years ago, be included in CDH," said Doug Cutting, Chief Architect, Cloudera. "Search is an incredibly powerful tool -- now it's scalable and integrated with the Hadoop platform."

Cloudera Search Feature Highlights
Cloudera Search is specifically designed to support business users with their quest to locate relevant data quickly and efficiently in Hadoop, for further processing and analysis. Cloudera Search is fully integrated with the CDH platform. Key features include:

  • Scalable, Reliable Index Storage in HDFS: integrates index storage and serving directly into HDFS
  • Batch Indexing via MapReduce: allows for index creation of data stored in HDFS and HBase as scalable and robust as MapReduce
  • Real-time Indexing at Collection: makes an event searchable as it is stored into Hadoop through near real-time indexing features powered by Apache Flume™
  • Easy Interaction and Data Exploration via Cloudera Hue: provides a plug-in application for Hue and easy-to-install capabilities for standard Hue servers to query data and view result files, and enables faceted exploration.
  • Simplified Field Extraction and Cross-Platform Data Processing: allows for quick and easy field extraction of any data that is stored into HDFS using optimized Hadoop file formats, such as Apache Avro™, avoiding the pain that many standalone search solutions might impose, and promotes reusable configurations and processing activities with the new processing framework, Cloudera Morphlines
  • Unified Management and Monitoring with Cloudera Manager: provides a centralized management and monitoring experience that makes it as easy to deploy, configure, and monitor search services as it is to manage CDH deployments and other services on the Hadoop cluster

"We're bringing the band back together with Cloudera Search," said Mike Olson, chief executive officer, Cloudera. "Based on 100% open source Apache Solr, a Lucene project and another Doug Cutting original, Cloudera Search is now fully integrated into our industry leading CDH big data platform. After a successful private beta, it's the latest in a series of major innovations that we've brought to market designed to speed up and simplify an organization's ability to get the most out of their data. We are further democratizing access to mission-critical information stored in Hadoop by ensuring those without programming expertise can gain insight, find patterns and derive true value from their information assets. Year after year we continue to push the boundaries of what is possible with Hadoop; we have the best minds in data management focused on advancing business transformation."

Product Availability
The first in the market to ship code, Cloudera Search is immediately available as a supplemental module for Cloudera Enterprise subscribers.

Additional Information

  • Visit the Cloudera Search page for partners support
  • View the Dell customer video
  • Cloudera is launching the first training course for data analysts to perform real-time analytics and use business intelligence tools directly on petabyte-scale data in Hadoop. Cloudera Data Analyst Training enables users to take advantage of Hadoop's massive scalability and flexibility benefits via SQL and familiar scripting languages. Participants will learn how to use tools like Apache Pig, Apache Hive and Cloudera Impala to achieve breakthrough insights more quickly and for less money, without the pain of migrating data or jumping between silos. Registration is available now via Cloudera University for public and private Data Analyst Training beginning in July.

About Cloudera
Founded in 2008, Cloudera pioneered the business case for Hadoop with CDH, the world's most comprehensive, thoroughly tested and widely deployed 100% open source distribution of Apache Hadoop in both commercial and non-commercial environments. Now, the company is redefining data management with its Platform for Big Data, Cloudera Enterprise, empowering enterprises to Ask Bigger Questions™ and gain rich, actionable insights from all their data, to quickly and easily derive real business value that translates into competitive advantage. As the top contributor to the Apache open source community and leading educator of data professionals with the broadest array of Hadoop training and certification programs, Cloudera also offers comprehensive consulting services. Over 600 partners across hardware, software and services have teamed with Cloudera to help meet organizations big data goals. With tens of thousands of nodes under management and hundreds of customers across diverse markets, Cloudera is the category leader that has set the standard for Hadoop in the enterprise.

Connect with Cloudera
Read our blog:
Follow us on Twitter:
Visit us on Facebook:

Contact Information
Press Contacts

North America
Hope Nicora
Bhava Communications for Cloudera
[email protected]

Richard Botley
Ketchum Pleon for Cloudera
[email protected]
+44 (0) 20 7611 3788