New conflation model #2991

texodus · 2025-05-12T17:00:27Z

This PR adds new APIs for controlling Perspective's polling behavior, which determines when and how Table::update calls are flushed to their respective registered Views. Additionally, the internal Python API has been optimized for this feature and to generally get better thread utilization when called in a multi-threaded context.

Update calls in Perspective occur in two steps. First, the update data type in the domain language native format is serialized and stored in a internal queue (which the source refers to as a port). Next, some time later (language/transport/etc dependent), when an internal method call perspective_server::Session::poll is called, the queue is flushed, Table updated in-place, and registered View instances are recalculated and notified. By slightly delaying the application of poll, back pressure from the expensive step of reconciling updates can be conflated in favor of serializing more input in the first step. In effect, a properly-tuned poll step allows the engine to dynamically trade frame-rate for data throughput.

Unfortunately, there isn't a good way to specify a tuned poll strategy that works across language implementations, supports async calls and multi-threaded calls, without compromise. Instead, this PR adds a new Server constructor option, on_poll_request, which (when specified) disables the internal (sub-optimal) polling strategy in favor of a user-provided callback which is responsible for scheduling the poll call.

For example, in Python you may implement polling naively as such:

server = perspective.Server(on_poll_request=lambda s: s.poll())

... but in a multi-threaded environment which always updates frequently and has many connected Client/View instances to update, you may opt for something like this which omits poll from being called if another thread is already processing the previous poll (which would otherwise block):

lock = threading.Lock()

def on_poll_request(server):
    if lock.acquire(blocking=False):
        try:
            server.poll()
        finally:
            lock.release()

perspective_server = perspective.Server(
    on_poll_request=on_poll_request),
)

Other options include dispatching the poll call to a dedicated thread, executor pool or event loop.

While on_poll_request allows powerful performance optimization for streaming, multi-tenant systems, it has a major trade-off, namely the engine is no longer guaranteed to be immediately consistent when run in this mode. Depending on your poll implementation, a View::to_columns call may not yet reflect a Table::update applied chronologically first. This may be problematic for workflows which uses perspective objects interactively, such as in a Jupyter Notebook. As such, the default Server instances do not use on_poll_request, and its implementation should only be necessary when the default implementation is insufficient.

perspective_python::AsyncServer & perspective_python::AsyncSession are end-to-end async implementations of their sync counterparts. You must use this if you want Perspective's internal Python callbacks to run on the caller thread's event loop, and these internal callbacks must be async themselves.
perspective::set_num_cpus & perspective::num_cpus for controlling the size of the internal threadpool, used for parallelising e.g. computation over columns.
perspective::Server::new now takes an options struct as an argument, which may specify the on_poll_request parameter to override the server's default polling.
perspective::Client::set_loop_callback is deprecated - if you want to dispatch to another thread, you must do so in the Session/Client callback implementation - see tornado.py.
The JavaScript test suite has been mass-updated to always await update calls. Previously, we relied on JavaScript's pervasive single-threaded-ness to guarantee that whil such calls were not awaited, they'd still end up being processed in-order. While this is still technically true, it prevents us from using the existing test suite to validate that on_poll_request does not break any other logic. If you also make this assumption in JavaScript Perspective <=3.6.1 , you'll need to make this change as well.
The tornado, aiohttp and starlette handlers have been updated to take an executor argument which allows incoming messages to be processed in parallel on a thread pool. In addition, a lot of testing (and small changes) have gone into making sure that the GIL is properly released during these calls, such that a Python executor can be properly saturated.
The python-tornado-streaming example has been updated to showcase a correct ThreadPoolExecutor.

Signed-off-by: Andrew Stein <steinlink@gmail.com>

texodus added enhancement Feature requests or improvements breaking labels May 12, 2025

texodus force-pushed the better-conflation branch 3 times, most recently from 2c1996d to 7abcf2b Compare May 13, 2025 00:52

New threading & conflation model

c3be4ae

Signed-off-by: Andrew Stein <steinlink@gmail.com>

texodus force-pushed the better-conflation branch from 7abcf2b to c3be4ae Compare May 13, 2025 01:54

texodus marked this pull request as ready for review May 15, 2025 16:32

texodus merged commit 43bd62c into master May 15, 2025
14 checks passed

texodus deleted the better-conflation branch May 15, 2025 16:42

texodus mentioned this pull request Jun 11, 2025

Fix to_arrow thread safety #3006

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

New conflation model #2991

New conflation model #2991

Uh oh!

texodus commented May 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

New conflation model #2991

New conflation model #2991

Uh oh!

Conversation

texodus commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

texodus commented May 12, 2025 •

edited

Loading