Fix run_in_spark crash with new-style services (APIMethod) by Mr-Neutr0n · Pull Request #5549 · bentoml/BentoML

Mr-Neutr0n · 2026-02-13T11:42:01Z

Summary

bentoml.batch.run_in_spark() crashes with AttributeError: 'APIMethod' object has no attribute 'input' when used with services defined via the @bentoml.service() decorator. The Spark batch code was written for the legacy InferenceAPI interface and wasn't updated after the service API refactor.

Changes

Detect service style at runtime — added _is_legacy_service() to distinguish between legacy Service (with InferenceAPI) and new-style services (with APIMethod).
Legacy services continue to use InferenceAPI.input/output IO descriptors and the existing HTTPClient, preserving full backward compatibility.
New-style services use SyncHTTPClient from _bentoml_impl.client (which auto-discovers endpoints from the running server's /schema.json) and convert Arrow RecordBatches through pandas. A _result_to_record_batch() helper handles converting common return types (ndarray, list, dict, DataFrame, etc.) back to Arrow RecordBatch format.
output_schema is now required for new-style services, since APIMethod doesn't expose spark_schema(). A clear error message is raised if it's missing.

Testing

Verified the fix against the reproduction case from the issue (service with @bentoml.task taking np.ndarray input, called via run_in_spark with an explicit output_schema).

The Spark batch processing code in spark.py was written for the legacy InferenceAPI interface which has .input and .output IO descriptors with from_arrow/to_arrow methods. After the service API refactor, services using the @bentoml.service() decorator produce APIMethod objects that don't have these attributes, causing an AttributeError at runtime. This patch adds detection for legacy vs new-style services and handles each appropriately: - Legacy services continue to use InferenceAPI.input/output with the existing HTTPClient, preserving backward compatibility. - New-style services use SyncHTTPClient (which auto-discovers endpoints from the server's schema) and handle Arrow RecordBatch conversion through pandas, with a helper that converts common return types (ndarray, list, dict, DataFrame) back to RecordBatch. - output_schema is now required for new-style services since APIMethod doesn't expose spark_schema(). Fixes bentoml#5524

For more information, see https://pre-commit.ci

Mr-Neutr0n requested a review from a team as a code owner February 13, 2026 11:42

Mr-Neutr0n requested review from jianshen92 and removed request for a team February 13, 2026 11:42

Mr-Neutr0n mentioned this pull request Feb 13, 2026

[Batch Inference] AttributeError when using bentoml.batch.run_in_spark - 'APIMethod' object has no attribute 'input' #5524

Open

ci: auto fixes from pre-commit.ci

97404ae

For more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix run_in_spark crash with new-style services (APIMethod)#5549

Fix run_in_spark crash with new-style services (APIMethod)#5549
Mr-Neutr0n wants to merge 2 commits intobentoml:mainfrom
Mr-Neutr0n:fix/spark-batch-apimethod-compat

Mr-Neutr0n commented Feb 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mr-Neutr0n commented Feb 13, 2026

Summary

Changes

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant