-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Hi, I'm opening this issue to report something likely to be a bug.
When creating a SearchRequest
for searchForStream
method, maxResults
parameter is overwriting size
, while it seems like it shouldn't be.
Bug
I set both maxResult
and Pageable.size
(will refer to pageSize
) fields for the query and passes it to SearchOperations.searchForStream.
When requestConverter.searchRequest
inside the method creates a SearchRequest
, maxResult
overwrites pageSize
.
In the below code, builder.size
is called first with query.getPageable().getPageSize()
, followed by another call with query.getMaxResults()
, overwriting the result of the first call.
Lines 1469 to 1492 in 0e5af90
if (query.getPageable().isPaged()) { | |
builder // | |
.from((int) query.getPageable().getOffset()) // | |
.size(query.getPageable().getPageSize()); | |
} else { | |
builder.from(0).size(INDEX_MAX_RESULT_WINDOW); | |
} | |
if (!isEmpty(query.getFields())) { | |
var fieldAndFormats = query.getFields().stream().map(field -> FieldAndFormat.of(b -> b.field(field))).toList(); | |
builder.fields(fieldAndFormats); | |
} | |
if (!isEmpty(query.getStoredFields())) { | |
builder.storedFields(query.getStoredFields()); | |
} | |
if (query.getIndicesOptions() != null) { | |
addIndicesOptions(builder, query.getIndicesOptions()); | |
} | |
if (query.isLimiting()) { | |
builder.size(query.getMaxResults()); | |
} |
I was stuck at the case where,
- 250,000 data available in the ES index
- Want to fetch at most 150,000 documents in total (out of 250k), using
SearchOperations.searchForStream
(maxResult 150000
) - Want to fetch 10,000 documents per page (per scroll search request), which fills almost all the response memory buffer (
Pageable.ofSize(10000)
)
result
- Expected : 15 search requests in total, a single search returning a page of 10k documents
- Actual : a single search tries to fetch 150k documents, overflowing response memory buffer and causing below Exception
Caused by: org.springframework.dao.DataAccessResourceFailureException: entity content is too long [169386197] for the configured buffer limit [104857600]; nested exception is java.lang.RuntimeException: entity content is too long [169386197] for the configured buffer limit [104857600]
Possible solution
I think two parameters have totally separate roles, thus one shouldn't overwrite the other.
pageSize
parameter should decide how many documents that I want ES to fetch per page,- while
maxResult
parameter decides how many documents I want to fetch in total during the whole stream operation.
👉 Therefore, the actual pageSize should be Min(maxResult, pageSize)
if maxResult
is set
Changes made : https://github.com/hy2850/spring-data-elasticsearch/commits/%233089-size-and-maxResults/
Further explanation with code for your understanding
SearchOperations.searchForStream
returns a CloseableIterator that uses scroll API to search through a large set of data by pages.
Currently, maxResult
is solely used to get maxCount
for this method.
Lines 410 to 423 in 0e5af90
@Override | |
public <T> SearchHitsIterator<T> searchForStream(Query query, Class<T> clazz, IndexCoordinates index) { | |
Duration scrollTime = query.getScrollTime() != null ? query.getScrollTime() : Duration.ofMinutes(1); | |
long scrollTimeInMillis = scrollTime.toMillis(); | |
// noinspection ConstantConditions | |
int maxCount = query.isLimiting() ? query.getMaxResults() : 0; | |
return StreamQueries.streamResults( // | |
maxCount, // | |
searchScrollStart(scrollTimeInMillis, query, clazz, index), // | |
scrollId -> searchScrollContinue(scrollId, scrollTimeInMillis, clazz, index), // | |
this::searchScrollClear); | |
} |
The iterator keeps track of how many documents it has fetched so far, in currentCount
instance variable.
Lines 130 to 137 in 0e5af90
@Override | |
public SearchHit<T> next() { | |
if (hasNext()) { | |
currentCount.incrementAndGet(); | |
return currentScrollHits.next(); | |
} | |
throw new NoSuchElementException(); | |
} |
When currentCount
reachesmaxCount
, the iterator is forced to return false for hasNext
, even when there could be more documents available for searching.
Lines 107 to 116 in 0e5af90
@Override | |
public boolean hasNext() { | |
boolean hasNext = false; | |
if (!isClosed && continueScroll && (maxCount <= 0 || currentCount.get() < maxCount)) { | |
if (!currentScrollHits.hasNext()) { | |
SearchScrollHits<T> nextPage = continueScrollFunction.apply(scrollState.getScrollId()); | |
currentScrollHits = nextPage.iterator(); |
So, I reached a conclusion that maxResult
should only decide maxCount
, not the pageSize
, and this is a bug.