Add a search function to your application with ElasticSearch in under 10 minutes

Published 9/12/2015 6:39:00 AM
Filed under Elastic Search

I personally think that one of the most important, yet underrated features of any
enterprise web application is search. Most people think about it, but then go ahead
and use things like SQL full-text search or some other form of search that isn't
really a search engine.

This is sad, since there's so much more to get for your users than a complex search
form that requires them to enter data in 5 fields and get nothing in return.

I think that if you add proper search you can easily shave of minutes of any workflow
in your application, because people have a better time finding stuff in your application.

In this post I will talk you through the process of installing ElasticSearch and
getting your first data indexed so it can be searched for. As a bonus, you can find
links to get even more out of your search functionality, so here it goes.

Getting things ready for search

Before you can start to index data and search for it, you need to have
ElasticSearch installed. You can download ElasticSearch from the their website.
Extract it somewhere on disk to get started.

Before you fire up ElasticSearch I recommend that you perform one edit in the
configuration file (config/ElasticSearch.yml). Please, pretty please, change the
cluster.name setting to something else then 'ElasticSearch'.

Why you say? Well, because if someone else is also running his cluster without
proper configuration in the same network, you end up joining the same cluster causing
all sorts of interesting problems for you and the other person.

Afer you have properly configured ElasticSearch, fire it up by executing bin/ElasticSearch.

Indexing data in your website

With ElasticSearch up and running you can move on to indexing data in your website.
This largely depends on what kind of data, but the general idea is like this:

When someone creates or updates data you need to index that data in ElasticSearch.
For this you need to modify your controllers to invoke the indexing operation.

Before you can do that though, you need to define how your data is indexed.
This is done by sending a mapping specification to the ElasticSearch server.

You can skip this part, but I suggest you take a minute and define the mapping
for your index. This will save you a headache later. The mapping determines how
well you can find the data later.

For this post I'm demonstrating how you could build a search function for a weblog,
so the mapping request will look like this:

POST /weblog
{
    "mappings": {
        "post": {
            "properties": {
                "title": {
                    "type": "string",
                    "index": "analyzed"
                },
                "body": {
                    "type": "string",
                    "index": "analyzed"
                },
                "tags": {
                    "type": "string",
                    "index": "not_analyzed"
                }
            }
        }
    }
}

This request will create the index weblog and a type within that index with the name post.
The post will have a title,body and tags. Title and body are analyzed, while the tags are not.
Analyzing a field means, you can enter a search query for that field and the search engine
will find hits even if the text doesn't match 100%. When it does find a match that isn't
100%, it will sort the results according to how well they match to your search query.

For tags in this case, I choose not to analyze them, so that I can match them exactly. This
is useful when you want to get all posts from the search index for a specific tag.

After defining a mapping you can start to write the indexing code.
To do this you can use the Nest Nuget package. To install this package
execute the following command:

dnu install Nest

The Nest package is the official API offered by Elastic to use ElasticSearch
from C#. This package easy to use and provides most of the basic stuff you need
to talk to ElasticSearch. If, in the odd case, something isn't available through
the Nest library you can always drop back to the ElasticSearch.NET library on which
the Nest library is based. So in short, the Nest library is the library to have
when working with ElasticSearch from C#.

With the package installed you can go ahead and build the indexing.
The indexing code is rather simple, but it's worth some explaining.

using System;
using System.Threading.Tasks;
using Weblog.Models;
using Nest;
using Polly;

namespace Weblog.Services
{
    public class PostIndexer: IPostIndexer
    {
        private ElasticClient _client;

        public PostIndexer()
        {
            var node = new Uri("http://localhost:9200");
            var settings = new ConnectionSettings(node);

            _client = new ElasticClient(settings);
        }

        public async Task IndexAsync(Post post)
        {
            // Indexes the content in ElasticSearch
            // in the weblog index. Uses the post type
            // mapping defined earlier.
            await _client.IndexAsync(post,
                (indexSelector) => indexSelector
                    .Index("weblog").Type("post"));
        }
    }
}

To talk to ElasticSearch you have to create a new instance of the ElasticClient class.
This class requires a set of connection settings. These settings define the server
you want to connect to the server and some basic options for communicating with the server.

With the ElasticClient instantiated you can start to index information. For this
you can use the method IndexAsync on the client. I've wrapped this in a public method
on the indexer class to make access to this method more straightforward.

In the IndexAsync method call you specify what should be indexed and how. The first
parameter is the data to be indexed. The second parameter tells the client that it should
index the content in the Weblog index. The type specified here maps directly on the the
mapping specified earlier.

You can specify a lot more when indexing content, but the index and type is all that you
need for basic indexing operations. Important to remember when trying to index content
is to make sure that you have the index created and type mapping configured ahead of time.
This will save you from all sorts of weird search problems later. Also, make sure that the
object you are indexing can be serialized as JSON. As this is what the ElasticClient is
doing when you invoke IndexAsync.

Searching for data on your website

With the data indexed you can move on to the search functionality. I think everyone
that has done ASP.NET development before is capable of producing a working controller,
so let's focus on building the search functionality itself.

The idea is to allow the user to search for content using a set of terms he/she entered
in the search box on the page. Search results should be returned in a paged manner, so
that users can browse through the results. Not using a paged resultset will cause your
website to die, so it's not only something you do to make things easier to understand
for the user. Retrieving results in a paged manner keeps the performance high as well.

The search code looks like this:

public class PostSearcher: IPostSearcher
{
	private static Policy CircuitBreaker = Policy
		.Handle<Exception>()
		.CircuitBreakerAsync(3, TimeSpan.FromSeconds(60));

	private ElasticClient _client;		

	public PostSearcher()
	{
		var node = new Uri("http://localhost:9200");
		var settings = new ConnectionSettings(node);

		_client = new ElasticClient(settings);
	}

	public async Task<PagedResult<IndexedPost>> FindPostsAsync(string query, int pageIndex)
	{
		// Use a circuit breaker to make the indexer operation more resistent against problems.
		// When this operation fails three times, we stop for a minute before trying again.
		return await CircuitBreaker.ExecuteAsync(async () => {
			// Important: For easy search, stick to the query_string operator.
			// This will automatically convert your query string into terms and search for them.
			// Doing this manually is possible, but a more difficult to do.
			var results = await _client.SearchAsync<IndexedPost>(searchRequest => searchRequest
  			.Index("weblog")
  			.Type("post")
  			.From(pageIndex * 30)
  			.Take(30)
  			.Query(querySpec => querySpec.QueryString(
  				queryString => queryString.DefaultField(post => post.Body).Query(query))));

			return new PagedResult<IndexedPost> {
				Items = results.Documents,
				PageSize = 30,
				PageIndex = pageIndex,
				Total = results.Total
			};
		});

	}
}

Again you will start out with a basic ElasticClient setup. After that you can start
to search for content using the SearchAsync method. This method accepts a generic argument
which tells the ElasticClient class how the search results _source property should be deserialized. When you search for content in ElasticSearch it will return a set of basic
properties for a document and a _source property. This _source property contains
the data that was serialized in the IndexAsync method earlier. So a top tip: Use the same
type in the SearchAsync method as you used in the IndexAsync method.

The SearchAsync method needs to be configured with an Index and type settings, which
is done by invoking the corresponding methods.

To page the results you invoke the From method with the offset in the resultset to
get. After that invoke the Take method with the amount of results you want to get.
Finally, you need to specify the search query.

The search query is the query_string operator. This search operator takes a
string of text and a default field. ElasticSearch will automatically convert
your search string using a default analyzer into terms that can be matched.

The query_string operator is one of the easiest query operators to use,
it is also the most limited operator. I would invest in writing proper query parsing
logic if you're going to use ElasticSearch for something more than just basic search.
But if you want to build a quick search function, this is the way to go.

I've added the PagedResult conversion in this method to make the search results
easier to render on the page. With this conversion added you're done integrating
search in your website.

Final thoughts

Adding search is so simple, I am pretty sure that most developers with a little skill
in C# are going to be able to get this running properly.

If you're interested in more information I can recommend
the ElasticSearch book
on the Elastic website.

More important though: Experiment, try it out. You will love it!