safe iteration while modifying lists

safe iteration while modifying lists
Photo by Sebastian Herrmann / Unsplash

Lists are arrays + a memory allocator so that upon modification it can reallocate its data array if necessary. They are essential to computing but they aren't thread safe. Not even coroutine safe.

But we can fix that. A safe variant that costs a teenytiny bit more during modification can be made. When you add/remove things from the list it must first check that its data pointer and iteration pointer are the same. If they are then it must first copy the data.

When an iterator starts it sets the iterator pointer to the data pointer and begins iterating. When it's done it sets the iterator pointer back to ΓΈ nothing. Only one coroutine/thread can be iterating the safe-iteration-list at a time.

The other approach is to make a copy up-front, which is the usual approach. Another approach is to put the list in a memory segment that can then be set to read-only. This will cause a page fault when someone tries to modify it and it can be copied at that point allowing the iterator (as many as you like) to safely keep iterating the list.

The last approach is to emulate that hardware supported page fault with a variant of list that includes an iterator count. Any time an iterator starts it increases it and when it's done it decreases it. If the iterator count > 0 then any modification needs to be also a copy.

There two are less efficient than the single-thread solution because you never know who has been iterating the latest version, therefore every modification needs to be a copy.

Ultimately, the dead-simplest way to do any of this is to manage how you use the list. If you know you intend to modify it while iterating it, copy it for the iterator. It can be cheaper to maintain two lists in parallel too. One for modification and one for iteration. The iteration version 'catches up' to the modification version when all iterators are done.

One final approach is to create a segmented list. This is a list of lists. It is efficient to iterate but for random access it has to do a search across its spans. This allows an efficient 'copy' of the list and any edits are applied to the original by cutting up the segments with slices. Remove something? you've split it in to two. You could, later, 'merge' all the segments back together by copying the whole segmented list.