Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 121 additions & 0 deletions docs/reference/dsl_how_to_guides.md
Original file line number Diff line number Diff line change
Expand Up @@ -1425,6 +1425,127 @@ print(response.took)
If you want to inspect the contents of the `response` objects, just use its `to_dict` method to get access to the raw data for pretty printing.


## ES|QL Queries

When working with `Document` classes, you can use the ES|QL query language to retrieve documents. For this you can use the `esql_from()` and `esql_execute()` methods available to all sub-classes of `Document`.

Consider the following `Employee` document definition:

```python
from elasticsearch.dsl import Document, InnerDoc, M

class Address(InnerDoc):
address: M[str]
city: M[str]
zip_code: M[str]

class Employee(Document):
emp_no: M[int]
first_name: M[str]
last_name: M[str]
height: M[float]
still_hired: M[bool]
address: M[Address]

class Index:
name = 'employees'
```

The `esql_from()` method creates a base ES|QL query for the index associated with the document class. The following example creates a base query for the `Employee` class:

```python
query = Employee.esql_from()
```

This query includes a `FROM` command with the index name, and a `KEEP` command that retrieves all the document attributes.

To execute this query and receive the results, you can pass the query to the `esql_execute()` method:

```python
for emp in Employee.esql_execute(query):
print(f"{emp.name} from {emp.address.city} is {emp.height:.2f}m tall")
```

In this example, the `esql_execute()` class method runs the query and returns all the documents in the index, up to the maximum of 1000 results allowed by ES|QL. Here is a possible output from this example:

```
Kevin Macias from North Robert is 1.60m tall
Drew Harris from Boltonshire is 1.68m tall
Julie Williams from Maddoxshire is 1.99m tall
Christopher Jones from Stevenbury is 1.98m tall
Anthony Lopez from Port Sarahtown is 2.42m tall
Tricia Stone from North Sueshire is 2.39m tall
Katherine Ramirez from Kimberlyton is 1.83m tall
...
```

To search for specific documents you can extend the base query with additional ES|QL commands that narrow the search criteria. The next example searches for documents that include only employees that are taller than 2 meters, sorted by their last name. It also limits the results to 4 people:

```python
query = (
Employee.esql_from()
.where(Employee.height > 2)
.sort(Employee.last_name)
.limit(4)
)
```

When running this query with the same for-loop shown above, possible results would be:

```
Michael Adkins from North Stacey is 2.48m tall
Kimberly Allen from Toddside is 2.24m tall
Crystal Austin from East Michaelchester is 2.30m tall
Rebecca Berger from Lake Adrianside is 2.40m tall
```

### Additional fields

ES|QL provides a few ways to add new fields to a query, for example through the `EVAL` command. The following example shows a query that adds an evaluated field:

```python
from elasticsearch.esql import E, functions

query = (
Employee.esql_from()
.eval(height_cm=functions.round(Employee.height * 100))
.where(E("height_cm") >= 200)
.sort(Employee.last_name)
.limit(10)
)
```

In this example we are adding the height in centimeters to the query, calculated from the `height` document field, which is in meters. The `height_cm` calculated field is available to use in other query clauses, and in particular is referenced in `where()` in this example. Note how the new field is given as `E("height_cm")` in this clause. The `E()` wrapper tells the query builder that the argument is an ES|QL field name and not a string literal. This is done automatically for document fields that are given as class attributes, such as `Employee.height` in the `eval()`. The `E()` wrapper is only needed for fields that are not in the document.

By default, the `esql_execute()` method returns only document instances. To receive any additional fields that are not part of the document in the query results, the `return_additional=True` argument can be passed to it, and then the results are returned as tuples with the document as first element, and a dictionary with the additional fields as second element:

```python
for emp, additional in Employee.esql_execute(query, return_additional=True):
print(emp.name, additional)
```

Example output from the query given above:

```
Michael Adkins {'height_cm': 248.0}
Kimberly Allen {'height_cm': 224.0}
Crystal Austin {'height_cm': 230.0}
Rebecca Berger {'height_cm': 240.0}
Katherine Blake {'height_cm': 214.0}
Edward Butler {'height_cm': 246.0}
Steven Carlson {'height_cm': 242.0}
Mark Carter {'height_cm': 240.0}
Joseph Castillo {'height_cm': 229.0}
Alexander Cohen {'height_cm': 245.0}
```

### Missing fields

The base query returned by the `esql_from()` method includes a `KEEP` command with the complete list of fields that are part of the document. If any subsequent clauses added to the query remove fields that are part of the document, then the `esql_execute()` method will raise an exception, because it will not be able construct complete document instances to return as results.

To prevent errors, it is recommended that the `keep()` and `drop()` clauses are not used when working with `Document` instances.

If a query has missing fields, it can be forced to execute without errors by passing the `ignore_missing_fields=True` argument to `esql_execute()`. When this option is used, returned documents will have any missing fields set to `None`.

## Using asyncio with Elasticsearch Python DSL [asyncio]

Expand Down
18 changes: 16 additions & 2 deletions docs/reference/dsl_tutorials.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,15 +83,15 @@ Let’s have a simple Python class representing an article in a blogging system:

```python
from datetime import datetime
from elasticsearch.dsl import Document, Date, Integer, Keyword, Text, connections
from elasticsearch.dsl import Document, Date, Integer, Keyword, Text, connections, mapped_field

# Define a default Elasticsearch client
connections.create_connection(hosts="https://localhost:9200")

class Article(Document):
title: str = mapped_field(Text(analyzer='snowball', fields={'raw': Keyword()}))
body: str = mapped_field(Text(analyzer='snowball'))
tags: str = mapped_field(Keyword())
tags: list[str] = mapped_field(Keyword())
published_from: datetime
lines: int

Expand Down Expand Up @@ -216,6 +216,20 @@ response = ubq.execute()
As you can see, the `Update By Query` object provides many of the savings offered by the `Search` object, and additionally allows one to update the results of the search based on a script assigned in the same manner.


## ES|QL Queries

The DSL module features an integration with the ES|QL query builder, consisting of two methods available in all `Document` sub-classes: `esql_from()` and `esql_execute()`. Using the `Article` document from above, we can search for up to ten articles that include `"world"` in their titles with the following ES|QL query:

```python
from elasticsearch.esql import functions

query = Article.esql_from().where(functions.match(Article.title, 'world')).limit(10)
for a in Article.esql_execute(query):
print(a.title)
```

Review the [ES|QL Query Builder section](esql-query-builder.md) to learn more about building ES|QL queries in Python.

## Migration from the standard client [_migration_from_the_standard_client]

You don’t have to port your entire application to get the benefits of the DSL module, you can start gradually by creating a `Search` object from your existing `dict`, modifying it using the API and serializing it back to a `dict`:
Expand Down
8 changes: 4 additions & 4 deletions docs/reference/esql-query-builder.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,20 +20,20 @@ The ES|QL Query Builder allows you to construct ES|QL queries using Python synta
You can then see the assembled ES|QL query by printing the resulting query object:

```python
>>> query
>>> print(query)
FROM employees
| SORT emp_no
| KEEP first_name, last_name, height
| EVAL height_feet = height * 3.281, height_cm = height * 100
| LIMIT 3
```

To execute this query, you can cast it to a string and pass the string to the `client.esql.query()` endpoint:
To execute this query, you can pass it to the `client.esql.query()` endpoint:

```python
>>> from elasticsearch import Elasticsearch
>>> client = Elasticsearch(hosts=[os.environ['ELASTICSEARCH_URL']])
>>> response = client.esql.query(query=str(query))
>>> response = client.esql.query(query=query)
```

The response body contains a `columns` attribute with the list of columns included in the results, and a `values` attribute with the list of results for the query, each given as a list of column values. Here is a possible response body returned by the example query given above:
Expand Down Expand Up @@ -216,7 +216,7 @@ def find_employee_by_name(name):
.keep("first_name", "last_name", "height")
.where(E("first_name") == E("?"))
)
return client.esql.query(query=str(query), params=[name])
return client.esql.query(query=query, params=[name])
```

Here the part of the query in which the untrusted data needs to be inserted is replaced with a parameter, which in ES|QL is defined by the question mark. When using Python expressions, the parameter must be given as `E("?")` so that it is treated as an expression and not as a literal string.
Expand Down
84 changes: 84 additions & 0 deletions elasticsearch/dsl/_async/document.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
TYPE_CHECKING,
Any,
AsyncIterable,
AsyncIterator,
Dict,
List,
Optional,
Expand All @@ -42,6 +43,7 @@

if TYPE_CHECKING:
from elasticsearch import AsyncElasticsearch
from elasticsearch.esql.esql import ESQLBase


class AsyncIndexMeta(DocumentMeta):
Expand Down Expand Up @@ -520,3 +522,85 @@ async def __anext__(self) -> Dict[str, Any]:
return action

return await async_bulk(es, Generate(actions), **kwargs)

@classmethod
async def esql_execute(
cls,
query: "ESQLBase",
return_additional: bool = False,
ignore_missing_fields: bool = False,
using: Optional[AsyncUsingType] = None,
**kwargs: Any,
) -> AsyncIterator[Union[Self, Tuple[Self, Dict[str, Any]]]]:
"""
Execute the given ES|QL query and return an iterator of 2-element tuples,
where the first element is an instance of this ``Document`` and the
second a dictionary with any remaining columns requested in the query.

:arg query: an ES|QL query object created with the ``esql_from()`` method.
:arg return_additional: if ``False`` (the default), this method returns
document objects. If set to ``True``, the method returns tuples with
a document in the first element and a dictionary with any additional
columns returned by the query in the second element.
:arg ignore_missing_fields: if ``False`` (the default), all the fields of
the document must be present in the query, or else an exception is
raised. Set to ``True`` to allow missing fields, which will result in
partially initialized document objects.
:arg using: connection alias to use, defaults to ``'default'``
:arg kwargs: additional options for the ``client.esql.query()`` function.
"""
es = cls._get_connection(using)
response = await es.esql.query(query=str(query), **kwargs)
query_columns = [col["name"] for col in response.body.get("columns", [])]

# Here we get the list of columns defined in the document, which are the
# columns that we will take from each result to assemble the document
# object.
# When `for_esql=False` is passed below by default, the list will include
# nested fields, which ES|QL does not return, causing an error. When passing
# `ignore_missing_fields=True` the list will be generated with
# `for_esql=True`, so the error will not occur, but the documents will
# not have any Nested objects in them.
doc_fields = set(cls._get_field_names(for_esql=ignore_missing_fields))
if not ignore_missing_fields and not doc_fields.issubset(set(query_columns)):
raise ValueError(
f"Not all fields of {cls.__name__} were returned by the query. "
"Make sure your document does not use Nested fields, which are "
"currently not supported in ES|QL. To force the query to be "
"evaluated in spite of the missing fields, pass set the "
"ignore_missing_fields=True option in the esql_execute() call."
)
non_doc_fields: set[str] = set(query_columns) - doc_fields - {"_id"}
index_id = query_columns.index("_id")

results = response.body.get("values", [])
for column_values in results:
# create a dictionary with all the document fields, expanding the
# dot notation returned by ES|QL into the recursive dictionaries
# used by Document.from_dict()
doc_dict: Dict[str, Any] = {}
for col, val in zip(query_columns, column_values):
if col in doc_fields:
cols = col.split(".")
d = doc_dict
for c in cols[:-1]:
if c not in d:
d[c] = {}
d = d[c]
d[cols[-1]] = val

# create the document instance
obj = cls(meta={"_id": column_values[index_id]})
obj._from_dict(doc_dict)

if return_additional:
# build a dict with any other values included in the response
other = {
col: val
for col, val in zip(query_columns, column_values)
if col in non_doc_fields
}

yield obj, other
else:
yield obj
Loading
Loading