elasticsearch: first things first

elasticsearch is a wrapper around Lucene that supports distributed, scalable search, particularly compatible with Amazon’s EC2 setup. The fun part about elasticsearch is the installation and setup (on a Mac):

step 1) download it.
step 2) unzip that file.
step 3) go to that directory and run bin/elasticsearch

Done. Ready to go. Proceed with inserting data.

Except! (There’s always a catch.) At startup, elasticsearch goes out and looks around your network for other computers running elasticsearch. If it finds one with the same cluster name, it’ll hook up, and poof! two elasticsearch clusters become one. Any data you insert will be visible to your fellow developers who are also fooling around. Or testers, if it’s running on a test box. This can freak you out when you’re not expecting it. Therefore:

step 2a) in config/elasticsearch.yml, the first setting is cluster.name. Change the value from “elasticsearch” to something uniquely yours.

Once you’ve changed your config, you’ll need to shut down your elasticsearch cluster:

curl localhost:9200/_shutdown -XPOST

then do ‘ps -ef | grep elasticsearch’ to make sure it’s dead. Kill the process if it is still running. (Yes, you can disable that easy-peasy command for production.)

Like any framework, elasticsearch is very easy to use as long as you stick with the default settings. To use elasticsearch for real, you need to think about indexes, documents, mappings, analyzers, filters, nodes, clusters, shards, and probably more concepts I haven’t encountered yet. However, the reasonable defaults and smart initialization mean that you don’t have to think about any of these right away. We can get the product up and running for proof of concept with very little effort and customize it in our own sweet time.