Playing with Ruby Sets.

The Set class in Ruby offers to us, well, basically sets and the ability to do operations on it. Like the sets from mathematics.

In some cases, we need to have a set of values without duplications, in this cases know a little bit about how to use Sets will make our code to shine and help a lot to boost our performance.

We will start to understand better how Sets works playing around some examples:

Let’s imagine some case where we need to play with a large in-memory set of components, and we need to end having unique values inside there. Instead of starting doing verifications and checking each one of our elements, where the processing time will increase together with the size of our set, we can use the set library to do the job for us.

1
2
3
4
5
6
7
8
9
require "set"
names = Set.new ["Jack", "John Locke"]
names.add "Hurley"
names.add "Jack"
names.add "Sayid"
names.add "John Locke"
=> #<Set: {"Jack", "John Locke", "Hurley", "Sayid"}>

Looking to the final set of names you can see that the values Jack and John Locke wasn’t added twice and we don’t need to add any extra verification to skip it before that add method.

(Don’t forget to use that require "set" when you’re planning to use the set feature on your code.)

Now, let’s take two ranges of values as an example, where the intersection between they have some duplicated values.

1
2
range1 = (0...40)
range2 = (20...70)

If we simply take these both and sum the elements size, we will get 90 as you can see:

1
2
range1.size + range2.size
=> 90

So, to start using the powers of sets in these two ranges, we have two basic options to make it happens. We can just use the method to_set to convert the object from a type of Range to a type of Set, or we can also instantiate a new Set object passing our range as the argument. Let’s see both in practice:

1) Using the method to_set

1
2
3
4
range1 = (0...40)
range1.class
=> Range
range1.to_set

If we forgot to required the ruby set implementation using require 'set' then we will get back an error like:

1
NoMethodError: undefined method `to_set' for 0..40:Range

Otherwise, we will able to see our new Set object:

1
2
3
4
range1.to_set
=> #<Set: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40}>
range1.class
=> Set

2) Instantiating a new Set object
As we mentioned before, we can instantiate a new Set object passing our range as the argument:

1
2
3
4
set1 = Set.new(range1)
=> #<Set: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40}>
set2 = Set.new(range2)
=> #<Set: {20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69}>

Finally, with our ranges converted to the Set class, we can start to make playing with some set operations, like for example the union of both sets:

1
2
3
union = set1 + set2
result.size
=> 70

Did you remember how many elements we got before when we summed the size of the two sets using the class Range? Exactly, 90 elements. Right now we’re getting 70 and successfully skipping the duplicated interval from 20 to 40.

A good observation to make at this point is if we tried to sum that two ranges we will get back an epic failure as we can see above:

1
2
range1 + range2
=> NoMethodError: undefined method `+' for 0...40:Range

That’s why we used the size method before to know how many elements we have in both ranges instead of just try to sum both directly using the + operator (method) as we’re doing with sets now.

Some others syntax sugar available to the + operator is the | and also the union as we can observe above:

1
2
3
(set1 + set2).size # => 70
(set1 | set2).size # => 70
(set1.union(set2)).size # => 70

Anyway, if we are looking for the difference between one set and the other, we can use the - operator:

1
2
3
4
set1 - set2
=> #<Set: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19}>
set2 - set1
=> #<Set: {41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69}>

we can get the full difference of both sets doing:

1
2
(set1 - set2) + (set2 - set1)
=> #<Set: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69}>

or we can simply use the ^ method to get the same:

1
2
3
4
5
set1 ^ set2
=> #<Set: {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69}>
(set1 ^ set2) == ((set1 - set2) + (set2 - set1))
=> true

To get the intersection between our sets we can use the & or intersection methods

1
2
3
4
5
6
7
8
set1 & set2
=> #<Set: {20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40}>
(set1 & set2).size #=> 21
(set1.intersection(set2)).size #=> 21
(set1 & set2) == set1.intersection(set2)
=> true

So, if you need not have duplicate values in some set of elements doing fast lookups, then you should consider using the Ruby Set class.

There’s also other operations available to improve our job using sets in ruby, but for now, I believe it’s enough to remember how it works and maybe start using it in the future.

avatar

Tailor Fontela

Software Developer. Full-time apprenticeship.