Building a Mutable Set

By Brian Ruud, June 01, 2003

Getting around the immutability of std::set in the name of performance.

I recently implemented an algorithm that computes the intersection (if any) among each pair of segments in a collection of nonvertical line segments, as shown in Figure 1. The requirements of the algorithm and the application in which it was used forced me to look for a new type of container to provide the flexibility and efficiency I needed. What I found was not a new container, but a twisted version of std::set that perfectly suited my needs. Though based on std::set, this construction is by no means general-purpose; it was developed to help solve a very specific problem. On the other hand, this is not just a fancy parlor trick. I have used this construction effectively in production code.

The Algorithm

The algorithm I implemented is a well-known scanline algorithm developed by Bentley and Ottmann [1]. As a scanline sweeps across the collection of segments, information is gathered and interpreted, and intersections are reported.

The basic algorithm is shown in Figure 2. As the algorithm proceeds and intersections are found, the x-coordinates of these intersection points are added to ScanSet. The segments in the subcollection ActiveSegs are ordered based on the current scan position x_c. For each segment in ActiveSegs, its intersection with the vertical line at x_c is calculated. The segments are then sorted by increasing value of the y-coordinates of these points of intersection. What makes this algorithm difficult is that this sort order changes each time the scan position moves. In particular, if two or more segments intersect at x_c, the relative order of their y-coordinates to the left of x_c is the reverse of their order to the right of x_c (see Figure 3). A naive implementation of the algorithm would reorder the entire collection of segments each time the scanline moves. But this is a very inefficient ( O(n^2 * log(n) ) algorithm. A more efficient solution exploits the fact that the algorithm only needs to re-order small groups of intersecting segments, rather than the entire collection. I'll describe that solution later in this article.

The details of detecting and reporting segment intersections are not fundamental to this article, so they are omitted. To put the algorithm into context, the line segment collections may contain several hundred thousand segments, so runtime efficiency is paramount. On the other hand, the users of this code typically have ample memory on their machines, so memory efficiency is of lesser importance. Also, all segment endpoints and intersections have integer coordinates.

Choosing a Container

Looking at the algorithm in Figure 2, the subcollection ActiveSegs must be able to perform the following operations:

1. Change the sort order of ActiveSegs on each iteration.

2. Insert a segment in sorted order.

3. Remove a segment.

4. Reverse the order of intersecting segments.

The following chart summarizes the runtime behavior of the standard containers on each of the four requirements. map is excluded, since no associations are being formed, as is deque, since it behaves more or less like vector. That leaves list, vector, and set as possible candidates.

    Requirement      list          vector        set
   -----------------------------------------------------------
1) Change sorting    O(n*log(n))   O(n*log(n))   not possible
2) Insert segment    O(n)          O(n)          O(log(n))
   in sorted order
3) Remove segment    constant      O(n)          O(log(n))
4) Reverse segments  O(n)          O(n)          not possible

list and vector perform similarly on everything but removal. Unfortunately, they have linear run time or worse for most of the required operations. Since these operations are performed within the main loop of the algorithm, run times are quadratic, so list and vector are simply too slow.

That leaves set, which does quite well on insertion and removal, because that is what it is designed to do. Unfortunately, set is not designed for any form of reorganization. Copying the data from one set to another could satisfy requirements 1 and 4, but this approach would be very slow.

In summary, the standard containers make trade offs between flexibility and runtime efficiency. list and vector are flexible and slow, while set is fast but limited in what it can do. Or is it?

Building an std::set

When a set of line segments is created, the set must be told how to compare two segments. You can perform this comparison using a function pointer, a function object, or an operator< for the segment class. Since sorting segments for this algorithm requires external state (the x-position of the scanline), this state must be stored somewhere. Putting the state in Segment::operator< is a poor choice, because a segment should not care about how it is sorted. Instead, I put the state inside a function object used by the set. A typical implementation of such an object is shown in Listing 1.

The set declaration looks like this:

int x_coord = some_scan_position;
std::set<Segment, fCompare> 
 SegSet(fCompare(x_coord));

This set is a perfectly acceptable and will correctly order all of the segments inserted into it based on the x-coordinate stored in fCompare. But what happens when the scanline moves and this x-coordinate needs to change? Unfortunately, std::set cannot provide this flexibility.

Messing With a Good Thing

The Standard Library is very flexible, so it makes sense to investigate ways of forcing std::set to reorganize its data how I want it, when I want it. A natural target is the state data, m_scan_pos, which is stored inside the set's comparison object. Creating a means to dynamically change the state data would satisfy the first requirement of the algorithm. There are two set methods, key_comp() and value_comp(), that return a comparison object of the same type as the one stored by the set. However, the C++ Standard dictates that these methods only return copies of the comparison object, not the object itself, so any modifications to the returned object would not affect the state of the object stored by the set.

A first attempt at circumventing this problem might be to make the state data m_scan_pos a static variable within fCompare, and then provide a static method set_scan_position() to change the data. This way, a copy of the comparison object could be modified and the changes would be reflected in the object stored by the set. Of course, this strategy would prevent fCompare from being used in more than one set at a time unless all the sets want their scan positions changed simultaneously.

A more subtle approach would be to pass the set a reference to the comparison object, rather than the object itself. Then the referenced object could be modified and the changes would be reflected within the set. Unfortunately (for this discussion), the Standards committee knew the dangers in allowing a set's comparison object to be modified in this way, so the Standard explicitly prevents std::set from storing such a reference [2].

It's Dangerous, But it Works

It turns out that it is not necessary to keep control of the entire comparison object. If m_scan_pos is created apart from the comparison object, and a pointer or reference to the data is stored inside the comparison object, the state data can change without the set knowing that it is happening. An implementation follows:

class fCompare
{
  public:
    fCompare(const int& coord) : 
      m_scan_pos(coord) {}
  private:
    const int& m_scan_pos;
};

int scan_pos;

std::set<Segment, fCompare> 
  SegSet( fCompare(scan_pos) );

I Was Warned...

Finding a way to modify the comparison object satisfies the first requirement of the segment intersection algorithm, but the technique described in the preceeding section is dangerous when used by itself. Listing 2 shows an example of a set of integers that can be sorted in increasing or decreasing order. The output from this program is the following:

0 1 2 3 4 
4 3 2 0 1 2 3 4 1 0

No matter what the intentions were when creating this set, this probably was not the expected outcome. For a set to do its job, it must rely on the state of its data and the consistency of its comparison object. Once either the state or the comparison object is modified, the set will still attempt to do its job, but likely with unintended results. This point is so important that there are entire items in both Scott Meyers' book, Effective STL [3] (Item 22) and in Herb Sutter's book, More Exceptional C++ [4] (Item 8), illustrating the dangers in modifying a set. To be accurate, both items talk of the pitfalls of modifying a set's contents by means of const_cast, but in summary, they both say that once you do something that makes the set's contents inconsistent with the set's comparison object, the set is no longer responsible for what may go wrong, so you'd better know what you're doing.

Being Careful

Item 22 of Effective STL concludes with "...if you perform any in-place modifications of [set] elements, you are responsible for making sure that the container remains sorted." This is what will be done with the set of segments. My task, therefore, is to ensure that each time the sort order of the set is changed, the contents of the set are modified to obey the new order.

Getting back to the algorithm, at each scan position x_c, the collection of active segments needs to have its ordering updated to reflect this new position. If intersections exist at this scan position, manual reordering of the segments is required to keep the segment order compatible with the comparison object. That is, if segment s1 intersects segment s2 at x_c, the order of these two segments needs to be reversed when the scanline crosses their point of intersection. But their order relative to the remaining segments in the collection does not change; any segments that were above or below the two intersecting segments will still be above or below them. The upshot of this is that if the order of the segments in each intersecting group is manually reversed each time the scan position is updated, the segment order will agree with the current state of the comparison object, and the underlying std::set will behave consistently.

I have two options for performing the reversal of intersecting segments:

After updating the scan position, remove and reinsert each intersecting segment. The intersecting segments are then inserted in the correct order. This is slow but safe.

Find each iterator pointing to an intersecting segment, apply const_cast<Segment&> to the result of iterator::operator* to get an assignable segment, and reassign the value of the segment. This is fast and dangerous.

Implementation

Listing 3 shows a partial implementation of MutableSegmentSet. This class contains a private set of segments, as well as the state data, m_scan_pos, used by the comparison object fCompare. fCompare contains a constant reference back to the set, and whenever it performs a comparison, it retrieves the current scan position from the set.

The task of reordering intersecting segments is performed using the second option just described. This option can be made safer by creating a private MutableIter class inside MutableSegmentSet. This iterator inherits from std::set::iterator and redefines operator* to make it return a Segment&, allowing write access to the contents of the iterator. In addition, the increment and decrement operators are redefined to return MutableIters, so that they can be used in standard algorithms. Now the order of a range (first, last) can be reversed by writing

MutableIter mi_first(first);
MutableIter mi_last(last);
std::reverse(mi_first, mi_last);

In the main loop of the algorithm, each time a new scan position is reached, the method set_scan_pos() is called, which, in turn, calls the private method reorder(). This method is responsible for detecting groups of segments that intersect at the current scan position and then calling the reverse() method to reverse their order. reverse() makes use of the MutableIter and uses std::reverse to do the actual reordering. All of the usual set methods (insert, erase, find) are passed through to the contained m_set. In its full context, the MutableSegmentSet would also add functionality for reporting any discovered intersections and for updating the collection of scan positions.

Aside from the details of reporting segment intersections, this implementation shows the necessary framework for creating a "mutable" set. Access to the mutable portion of the set's functionality is limited to one method, set_scan_pos(), which is in charge of maintaining the consistency of the comparison object and the order of segments in the set.

Conclusion

The task of developing and implementing an algorithm frequently consists of a tension between the functional and performance requirements of the algorithm and the tools available for implementing the algorithm. In this case, the tools provided by the Standard Library provided the necessary functionality but suffered from prohibitively slow run times and had to be extended in new ways. The solution presented here is not useful so much for the problem it solves but rather for the techniques it uses. Even for something as strictly controlled and specified as std::set, there are ways of making it do what it was specifically designed not to do, all in the name of performance. And the change is invisible to both the user and the underlying set.

Acknowledgments

I would like to thank Pete Isensee, Paul Shaffer, and Caryn Ruud for their proofreading and helpful comments.

Notes

[1] Jon Bentley and Thomas Ottmann, "Algorithms for Reporting and Counting Geometric Intersections," IEEE Trans. Computers C-28, 643-647 (1979).

[2] ISO/IEC, International Standard, Programming Languages -- C++, Reference Number ISO/IEC 14882:1998(E), 1998, Section 23.1.2.

[3] Scott Meyers, Effective STL, Addison-Wesley, 2001.

[4] Herb Sutter, More Exceptional C++, Addison-Wesley, 2002.

About the Author

Brian Ruud holds a Ph.D. in mathematics from the University of Washington. He is a software engineer in the field of electronic design automation and is currently working for Virage Logic (www.viragelogic.com/), focusing on geometric algorithm design and infrastructure. His interests are in STL usage and generic programming. He can be reached at [email protected].

1 2 3 4 5 6 7 Next

More Insights

INFO-LINK


	To upload an avatar photo, first complete your Disqus profile. \| View the list of supported HTML tags you can use to style comments. \| Please read our commenting policy.

Building a Mutable Set

The Algorithm

Choosing a Container

Building an std::set

Messing With a Good Thing

It's Dangerous, But it Works

I Was Warned...

Being Careful

Implementation

Conclusion

Acknowledgments

Notes

About the Author

Related Reading

More Insights

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content

Building a Mutable Set

The Algorithm

Choosing a Container

Building an std::set

Messing With a Good Thing

It's Dangerous, But it Works

I Was Warned...

Being Careful

Implementation

Conclusion

Acknowledgments

Notes

About the Author

Related Reading

News

Commentary

Slideshow

Video

Most Popular

More Insights

White Papers

Reports

Webcasts

Currently we allow the following HTML tags in comments:

Single tags

Matching tags

Recent Articles

Most Popular

This month's Dr. Dobb's Journal

Upcoming Events

Featured Reports

Featured Whitepapers

Most Recent Premium Content