ElasticsearchRetriever
Elasticsearch is a distributed, RESTful search and analytics engine. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. It supports keyword search, vector search, hybrid search and complex filtering.
The ElasticsearchRetriever
is a generic wrapper to enable flexible access to all Elasticsearch
features through the Query DSL. For most use cases the other classes (ElasticsearchStore
, ElasticsearchEmbeddings
, etc.) should suffice, but if they don't you can use ElasticsearchRetriever
.
This guide will help you getting started with the Elasticsearch retriever. For detailed documentation of all ElasticsearchRetriever
features and configurations head to the API reference.
Integration detailsโ
Retriever | Self-host | Cloud offering | Package |
---|---|---|---|
ElasticsearchRetriever | โ | โ | langchain_elasticsearch |
Setupโ
There are two main ways to set up an Elasticsearch instance:
-
Elastic Cloud: Elastic Cloud is a managed Elasticsearch service. Sign up for a free trial. To connect to an Elasticsearch instance that does not require login credentials (starting the docker instance with security enabled), pass the Elasticsearch URL and index name along with the embedding object to the constructor.
-
Local Install Elasticsearch: Get started with Elasticsearch by running it locally. The easiest way is to use the official Elasticsearch Docker image. See the Elasticsearch Docker documentation for more information.
If you want to get automated tracing from individual queries, you can also set your LangSmith API key by uncommenting below:
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
# os.environ["LANGSMITH_TRACING"] = "true"
Installationโ
This retriever lives in the langchain-elasticsearch
package. For demonstration purposes, we will also install langchain-community
to generate text embeddings.
%pip install -qU langchain-community langchain-elasticsearch
from typing import Any, Dict, Iterable
from elasticsearch import Elasticsearch
from elasticsearch.helpers import bulk
from langchain_community.embeddings import DeterministicFakeEmbedding
from langchain_core.documents import Document
from langchain_core.embeddings import Embeddings
from langchain_elasticsearch import ElasticsearchRetriever
Configureโ
Here we define the conncection to Elasticsearch. In this example we use a locally running instance. Alternatively, you can make an account in Elastic Cloud and start a free trial.
es_url = "http://localhost:9200"
es_client = Elasticsearch(hosts=[es_url])
es_client.info()
For vector search, we are going to use random embeddings just for illustration. For real use cases, pick one of the available LangChain Embeddings classes.
embeddings = DeterministicFakeEmbedding(size=3)
Define example dataโ
index_name = "test-langchain-retriever"
text_field = "text"
dense_vector_field = "fake_embedding"
num_characters_field = "num_characters"
texts = [
"foo",
"bar",
"world",
"hello world",
"hello",
"foo bar",
"bla bla foo",
]