(Question) Does `SequentialOrderManager` track externally-submitted work on a shared `sycl::queue`

@ndgrigorian @vlad-perevezentsev @antonwolfy @alvarovm


Just wanted to reach out related to a setup we have been working with an application using cupy and dpnp (porting).

**App**: Heavily uses cupy and hand-written CUDA kernels. These were ported to DPNP and SYCL kernels for Intel PVC (on Aurora). 

**Intent**: The app relies on the use of `stream = cupy.cuda.get_current_stream()` , `stream.synchronize()`, and passing the stream obtained by the above to pass it down to user CUDA kernels. A pipeline methodology where a given instance/process is tied to a single stream through the upstream cupy to downstream user CUDA kernels. 

With the port to DPNP & SYCL (currently), we have relied on using a single queue written for user kernels and dpnp/dpctl ops wasnt aware of this queue. Since they were all created with the same default context (using `ZE_SERIALIZE=2, OverrideImmediateCmdListSynchronousMode=1`) helped with the initial correctness testing. Since everything got synchronized and default-context shared by any number of queues wasnt a problem.

**Goal**: The intent now is to move from the above design and explore if upstream `dpctl` provides us a single master (in-order) queue per-device `q = dpctl.SyclQueue(devs[device_id], property="in_order")` such that we can use it to the downstream user SYCL kernels as well. Since everything is in-order we expect similar FIFO behavior as `cudaStream`

Few questions if you can provide any pointers:
0. Would the above mentioned design be feasible. 
1. Since there is no such equivalent as `stream = cupy.cuda.get_current_stream()`, we have tried something like `from ._sycl_queue_manager import get_device_cached_queue` but the obtained queue from `get_device_cached_queue` is inherently an OoO queue which doesnt really suite our broader needs in the user-app. Since we are trying to mimic the infrastructure based on `cudaStream` FIFO. So we had to start using `q = dpctl.SyclQueue(devs[device_id], property="in_order")`. 
2. Does `SequentialOrderManager` track externally-submitted work on a dpctl obtained `sycl::queue`

More details:
Nothing in SequentialOrder observes the queue itself. So if a user extension does:
```bash
pythonq = dpctl.SyclQueue()
a = dpt.arange(..., sycl_queue=q)          # dpctl registers events
my_ext.launch_kernel(q, a.usm_data, ...)   # user submits directly, NOT registered
b = dpt.sum(a, ...)                        # dpctl uses its tracked deps only
```
then the `my_ext.launch_kernel` event is invisible to the manager. A subsequent `dpt.sum` submission's depends_on will not include it, and `SequentialOrderManager[q].wait()` will return before the user kernel completes.
Is the following characterization correct?

In-order queue: user-submitted kernels are still correctly ordered against later dpctl submissions because the queue itself serializes, so the data dependency is safe. But `SequentialOrderManager[q].wait()` may return early because it only waits on events it knows about.
Out-of-order queue: nothing orders the user's kernel against later dpctl submissions, so without the user explicitly passing `som.submitted_events` as deps and registering their own event back, this is a genuine race on shared USM.


I understand this is a lot and would appreciate any feedback. There are a rich set of DPCPP extensions too that can help in making this robust and lightweight without explicit book-keeping and would be more than welcome to contribute if there is sufficient interest

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Question) Does `SequentialOrderManager` track externally-submitted work on a shared `sycl::queue` #2293

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

(Question) Does SequentialOrderManager track externally-submitted work on a shared sycl::queue #2293

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

(Question) Does `SequentialOrderManager` track externally-submitted work on a shared `sycl::queue` #2293