Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds new APIs for controlling Perspective's polling behavior, which determines when and how
Table::updatecalls are flushed to their respective registeredViews. Additionally, the internal Python API has been optimized for this feature and to generally get better thread utilization when called in a multi-threaded context.Update calls in Perspective occur in two steps. First, the update data type in the domain language native format is serialized and stored in a internal queue (which the source refers to as a
port). Next, some time later (language/transport/etc dependent), when an internal method callperspective_server::Session::pollis called, the queue is flushed,Tableupdated in-place, and registeredViewinstances are recalculated and notified. By slightly delaying the application ofpoll, back pressure from the expensive step of reconciling updates can be conflated in favor of serializing more input in the first step. In effect, a properly-tunedpollstep allows the engine to dynamically trade frame-rate for data throughput.Unfortunately, there isn't a good way to specify a tuned
pollstrategy that works across language implementations, supportsasynccalls and multi-threaded calls, without compromise. Instead, this PR adds a newServerconstructor option,on_poll_request, which (when specified) disables the internal (sub-optimal) polling strategy in favor of a user-provided callback which is responsible for scheduling thepollcall.For example, in Python you may implement polling naively as such:
... but in a multi-threaded environment which always updates frequently and has many connected
Client/Viewinstances to update, you may opt for something like this which omitspollfrom being called if another thread is already processing the previouspoll(which would otherwise block):Other options include dispatching the
pollcall to a dedicated thread, executor pool or event loop.While
on_poll_requestallows powerful performance optimization for streaming, multi-tenant systems, it has a major trade-off, namely the engine is no longer guaranteed to be immediately consistent when run in this mode. Depending on yourpollimplementation, aView::to_columnscall may not yet reflect aTable::updateapplied chronologically first. This may be problematic for workflows which uses perspective objects interactively, such as in a Jupyter Notebook. As such, the defaultServerinstances do not useon_poll_request, and its implementation should only be necessary when the default implementation is insufficient.perspective_python::AsyncServer&perspective_python::AsyncSessionare end-to-end async implementations of their sync counterparts. You must use this if you want Perspective's internal Python callbacks to run on the caller thread's event loop, and these internal callbacks must be async themselves.perspective::set_num_cpus&perspective::num_cpusfor controlling the size of the internal threadpool, used for parallelising e.g. computation over columns.perspective::Server::newnow takes an options struct as an argument, which may specify theon_poll_requestparameter to override the server's default polling.perspective::Client::set_loop_callbackis deprecated - if you want to dispatch to another thread, you must do so in theSession/Clientcallback implementation - seetornado.py.awaitupdate calls. Previously, we relied on JavaScript's pervasive single-threaded-ness to guarantee that whil such calls were not awaited, they'd still end up being processed in-order. While this is still technically true, it prevents us from using the existing test suite to validate thaton_poll_requestdoes not break any other logic. If you also make this assumption in JavaScript Perspective<=3.6.1, you'll need to make this change as well.tornado,aiohttpandstarlettehandlers have been updated to take anexecutorargument which allows incoming messages to be processed in parallel on a thread pool. In addition, a lot of testing (and small changes) have gone into making sure that the GIL is properly released during these calls, such that a Python executor can be properly saturated.python-tornado-streamingexample has been updated to showcase a correctThreadPoolExecutor.