[WIP] Batch pushes and pops#103
Conversation
|
Wow, Raphael, this is a massive contribution with no precedents in the past. I am most thankful, of course, but the feeling of being most impressed overwhelms me. «You will know them by their fruit» and your fruit looks like a product of the most delicate labour of love to me 🤷♂️💯. |
Thank you for your kind comments and for such a great library - Truly one of the greatest and most thoroughly engineered multi-producer multi-consumer queues out there. |
|
The unit-tests fail because of taking too much time to execute: It looks like the tests gets stuck in: |
max0x7ba
left a comment
There was a problem hiding this comment.
It is probably the missing checks for sizes in push and pop what cause the unit-test to deadlock.
| unsigned head; | ||
| if(Derived::spsc_) { | ||
| head = head_.load(X); | ||
| head_.store(head + n, X); |
There was a problem hiding this comment.
n can be greater than the buffer size of the number of free slots in the queue. These conditions must be checked.
There was a problem hiding this comment.
There does seem to be an issue here, but it is not so trivial to pinpoint the exact data race with a loop around of the buffer that triggers it.
I will need some time to figure this one out.
There was a problem hiding this comment.
There was an issue in the tests. The stopping condition was not correctly implemented when CONSUMERS * BATCH_SIZE > CAPACITY. This is now fixed.
In general, the issue you are describing can occur already when CONSUMERS * BATCH_SIZE > CAPACITY, meaning this can already happen with a lot of consumer threads even with single optimist push'es and pop's.
The good news this is not a problem for deadlocking (when properly dealing with the (batched) optimist queue). I shall sketch why this is the case later. Tests have been added to cover this case. The bad news is that when CONSUMERS * BATCH_SIZE > CAPACITY, push'es and pops can happen out of order. For example, two producers can be allocated slots head_1 and head_2 with head_1 < head_2 and head_1 % CAPACITY == head_2 % CAPACITY and it can happen now that the producer with allocated slot head_2 push'es first to the slot and the producer with allocated slot head_1 has to wait. This is an issue in so far as the queue no longer acts as "FIFO". The sketch as to why this does not deadlock is that one can imagine that the two producers swap "roles" when this happens, i.e. you swap the data that would be pushed to head_1 and head_2 and now just pretend that it was the first producer corresponding to head_1 that did the push and the second producer is the one waiting. (I need to write this down better)
There was a problem hiding this comment.
The case BATCH_SIZE > CAPACITY is not an issue, it just means the producer has to wait until consumers pop'ed enough for the producer to continue push'ing.
| unsigned tail; | ||
| if(Derived::spsc_) { | ||
| tail = tail_.load(X); | ||
| tail_.store(tail + n, X); |
The batch sizes in the tests are too small relative to the capacity to trigger the mentioned issue (to be addressed). The more likely culprit here is that the tests do take longer with the sanitizers. On my machine, they run in a little more than 60s. The additional tests in the current PR increase the test time by 5x and the pre-existing test does take around 12-13s. |
|
May be do shorter tests with sanitisers? Building with sanitizers defines extra macros that can be used to adjust the number of test iterations. May be do random batch sizes. |
|
Thinking more about iterators, It is conceivable that the caller of push knows the exact size of the iterator range, Yet the iterators may not necessarily be of random-access category. std::distance complexity is O(1) for random-access iterators only and O(n) for anything else. Calling std::distance has non-zero cost, in general. We must not call std::distance. Let the caller supply the length of the iterator range, it may have the length already available. Zero-cost batch interface, is: It also enables passing in any kind of iterator, including single-pass input iterators, which are often generator objects, producing the next value in its overloaded |
GitHub actions hosts may be using cheapest shared CPUs, threads get little CPU time. Consumer threads may get delayed and and the queues can easily get full in the unit-tests. |
|
I extended the test cases to cover large BATCH_SIZE including ones that are > CAPACITY. The batch size is now random every time. The meson tests should now run in about 20-22 seconds. I converted the PR back to draft to do the following:
|
This pull request adds batch pushes and pops using iterator semantics to alleviate pressure on the atomic heads and tails of the queue.
In particular, it adds the following functions with the following signature:
Some details:
Note that int in 2. and unsigned in 4. are purposely chosen such that the implementation uses fewer conversions and is more efficient.