Skip to content

Fix run_in_spark crash with new-style services (APIMethod)#5549

Open
Mr-Neutr0n wants to merge 2 commits intobentoml:mainfrom
Mr-Neutr0n:fix/spark-batch-apimethod-compat
Open

Fix run_in_spark crash with new-style services (APIMethod)#5549
Mr-Neutr0n wants to merge 2 commits intobentoml:mainfrom
Mr-Neutr0n:fix/spark-batch-apimethod-compat

Conversation

@Mr-Neutr0n
Copy link
Copy Markdown

Summary

Fixes #5524.

bentoml.batch.run_in_spark() crashes with AttributeError: 'APIMethod' object has no attribute 'input' when used with services defined via the @bentoml.service() decorator. The Spark batch code was written for the legacy InferenceAPI interface and wasn't updated after the service API refactor.

Changes

  • Detect service style at runtime — added _is_legacy_service() to distinguish between legacy Service (with InferenceAPI) and new-style services (with APIMethod).
  • Legacy services continue to use InferenceAPI.input/output IO descriptors and the existing HTTPClient, preserving full backward compatibility.
  • New-style services use SyncHTTPClient from _bentoml_impl.client (which auto-discovers endpoints from the running server's /schema.json) and convert Arrow RecordBatches through pandas. A _result_to_record_batch() helper handles converting common return types (ndarray, list, dict, DataFrame, etc.) back to Arrow RecordBatch format.
  • output_schema is now required for new-style services, since APIMethod doesn't expose spark_schema(). A clear error message is raised if it's missing.

Testing

Verified the fix against the reproduction case from the issue (service with @bentoml.task taking np.ndarray input, called via run_in_spark with an explicit output_schema).

The Spark batch processing code in spark.py was written for the legacy
InferenceAPI interface which has .input and .output IO descriptors with
from_arrow/to_arrow methods. After the service API refactor, services
using the @bentoml.service() decorator produce APIMethod objects that
don't have these attributes, causing an AttributeError at runtime.

This patch adds detection for legacy vs new-style services and handles
each appropriately:

- Legacy services continue to use InferenceAPI.input/output with the
  existing HTTPClient, preserving backward compatibility.
- New-style services use SyncHTTPClient (which auto-discovers endpoints
  from the server's schema) and handle Arrow RecordBatch conversion
  through pandas, with a helper that converts common return types
  (ndarray, list, dict, DataFrame) back to RecordBatch.
- output_schema is now required for new-style services since APIMethod
  doesn't expose spark_schema().

Fixes bentoml#5524
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Batch Inference] AttributeError when using bentoml.batch.run_in_spark - 'APIMethod' object has no attribute 'input'

1 participant