Software Development - Search Servers and API

Apache Solr

URL Description
Apache Solr - Main Site
Apache Solr - Tutorial
Apache Solr - Wikipedia
Apache Solr: Get Started, Get Excited

Apache Solr is an open source search server. It is based on the full text search engine called Apache Lucene. So basically Solr is an HTTP wrapper around an inverted index provided by Lucene. An inverted index could be seen as a list of words where each word-entry links to the documents it is contained in. That way getting all documents for the search query "dzone" is a simple 'get' operation.

One advantage of Solr in enterprise projects is that you don't need any Java code, although Java itself has to be installed. If you are unsure when to use Solr and when Lucene, these answers could help. If you need to build your Solr index from websites, you should take a look into the open source crawler called Apache Nutch before creating your own solution.

To be convinced that Solr is actually used in a lot of enterprise projects, take a look at this amazing list of public projects powered by Solr. If you encounter problems then the mailing list or stackoverflow will help you. To make the introduction complete I would like to mention my personal link list and the resources page which lists books, articles and more interesting material.

Getting Started with Apache Solr
Introduction to Apache Solr
Solr 4 - The NoSQL Database (YouTube) Creator of Solr: Yonik Seeley

Discusses the newest Solr 4. Some of the features are

  • Document Oriented NoSQL Search Platform
  • Data format agnostic (JSON, XML, CSV,binary)
  • Distributed
  • Fault Tolerant (HA + no single points of failure)
  • Atomic Updates
  • Optimistic Concurrency
  • Full-Text search + Hit Highlighting
  • Tons of specialized queries: Faceted search, grouping, pseudo-join, spatial search, functions
Solr in 5 Minutes (YouTube)