Wyatt Baldwin

Finding the Oddity in a Sequence

After reading Ned Batchelder’s Iter-tools for puzzles: oddity post yesterday, my first thought was to use itertools.groupby(). That version is the fastest I could come up with (by quite a bit actually, especially for longer sequences), but it requires sorting the iterable first, which uses additional space and won’t work with infinite sequences.

My next thought was to use a set to keep track of seen elements, but that requires the keys to be hashable, so I scrapped that idea.

I figured using a list to keep track of seen elements wouldn’t be too bad if the seen list was never allowed to grow beyond two elements. After playing around with this version for a while, I finally came up with something on par with Ned’s version performance wise while meeting the following objectives:

Some interesting things:

Here’s the code (the gist includes a docstring, tests, and a simplistic benchmark):

def oddity(iterable, key=None, first=False):
    seen = []
    common = []
    uncommon = []

    for item in iter(iterable):
        item_key = key(item) if key else item

        if item_key in seen:
            if item_key in common:
                if first and uncommon:
                    return item_key, uncommon[1]
                if len(common) == 2:
                    raise TooManyCommonValuesError

                if item_key in uncommon:
                    i = uncommon.index(item_key)
                    j = i + 2
                    uncommon[i:j] = []
            if len(seen) == 3:
                raise TooManyDistinctValuesError
            uncommon.extend((item_key, item))

    if len(seen) == 0:
        raise EmptyError

    if len(common) == 0:
        raise NoCommonValueError

    if len(uncommon) == 0:
        uncommon_value = None
        uncommon_value = uncommon[1]

return common[0], uncommon_value