Finding the Oddity in a Sequence

After reading Ned Batchelder’s Iter-tools for puzzles: oddity post yesterday, my first thought was to use itertools.groupby(). That version is the fastest I could come up with (by quite a bit actually, especially for longer sequences), but it requires sorting the iterable first, which uses additional space and won’t work with infinite sequences.

My next thought was to use a set to keep track of seen elements, but that requires the keys to be hashable, so I scrapped that idea.

I figured using a list to keep track of seen elements wouldn’t be too bad if the seen list was never allowed to grow beyond two elements. After playing around with this version for a while, I finally came up with something on par with Ned’s version performance wise while meeting the following objectives:

  • Don’t require keys to be hashable
  • Don’t store the elements read from the iterable
  • Support infinite sequences

Some interesting things:

  • Rearranging the branches to use in instead of not in sped things up more than I would have thought
  • Initially, I was checking the lengths of the seen and common lists on every iteration, which slowed things down noticeably (which isn’t all that surprising really)
  • Setting the key function to the identity function lambda v: v is noticeably slower than checking to see if it’s set on every iteration (also not surprising)
  • Using sets/dicts for seen, common, and uncommon didn’t make a noticeable difference (I tried adding a hashable option and setting things up accordingly, but it wasn’t worth the additional complexity)

Here’s the code (the gist includes a docstring, tests, and a simplistic benchmark):

def oddity(iterable, key=None, first=False):
    seen = []
    common = []
    uncommon = []

    for item in iter(iterable):
        item_key = key(item) if key else item

        if item_key in seen:
            if item_key in common:
                if first and uncommon:
                    return item_key, uncommon[1]
                if len(common) == 2:
                    raise TooManyCommonValuesError

                if item_key in uncommon:
                    i = uncommon.index(item_key)
                    j = i + 2
                    uncommon[i:j] = []
            if len(seen) == 3:
                raise TooManyDistinctValuesError
            uncommon.extend((item_key, item))

    if len(seen) == 0:
        raise EmptyError

    if len(common) == 0:
        raise NoCommonValueError

    if len(uncommon) == 0:
        uncommon_value = None
        uncommon_value = uncommon[1]

return common[0], uncommon_value

Leave a Reply

Your email address will not be published. Required fields are marked *