Mon 27 October 2014

Filed under Blog

Tags snippets

I came across this question while reading through the Python mailing list. I was actually sort of offended. Why would universities across the world agree to teach this foundational topic in Computer Science if it wasn't useful? Recursion is the natural approach to a class of problems that would be much more difficult to solve with a for loop.

Recognizing a Recursive Problem

When you see a problem that looks like a russion doll, think recursion.

Russian Dolls

So a russian doll is one where you have a big doll, but if you open it up there is a smaller doll inside. This keeps going until you get to the smallest doll that you can't open. Now Imagine that you're trying to paint russian dolls, however once you've painted one, you can't open it up because you have to wait for it to dry. What do you do?

You open up the first doll, find a doll inside, open up the second doll, and keep going until you've found the smallest one. You paint the smallest one, then put it back into the second smallest, then repeat until you've painted all of them.

Now that you have the intuition, let's look at an actual code example.

My Real World Example

This is an example derived from my own work. I've just swapped the actual subject matter to recipes because it's easier to relate to; the crux of the problem is the same.

Note: All of this is in Python >= 3.3

Here is the nested structure we are dealing with:

recipes = {
    'apple_pie': {
        'ingredients': ['flour', 'sugar', 'eggs', 'shortening', 'apples', 'cinnamon'],
        'cook_time': 60,
        'prep_time': 30,
        'ratings': {'john': 10, 'emily': 8, 'mom': 4}
    },
    'peanut_butter_cookies': {
        'ingredients': ['flour', 'sugar', 'peanut_butter', 'butter', 'eggs'],
        'cook_time': 45,
        'prep_time': 20,
        'allergens': ['peanuts']
    },
    'clam_chowder': {
        'ingredients': ['flour', 'milk', 'clams', 'vegetable_stock', 'salt'],
        'cook_time': 120,
        'prep_time': 30,
        'allergens': ['shellfish']
    }
}

Each recipe has commonalities but not all recipes are guaranteed to have all keys. For example, allergens is not present in apple pie but it is in peanut butter cookies and clam chowder. Similarly, apple pie has ratings but neither of the other two do.

I'd really like an easy way to reach in and check for the existence of a key at any level of nesting. So I'd like to see if there is a rating from john for my apple pie recipe.

One naive way to solve this is to just do the checks.

    result = False
    if 'apple_pie' in recipes:
         if 'ratings' in recipes['apple_pie']:
            if 'john' in recipes['apple_pie']['ratings']:
                result = True

    print result

Alternatively, you could combine all the checks into a line by relying on Python's ability to short circuit. A bit better, but not much.

    result = False
    if 'apple_pie' in recipes and 'ratings' in recipes['apple_pie'] and 'john' in recipes['apple_pie']['ratings']:
        result = True
    print result

Still Ugghz.

Let's Use Recursion!

The basic idea is to go into the dictionary and flatten all of the nested dictionaries so that the keys are accessible at the top level. Then we could write code like:

result = False
if 'apple_pie/ratings/john' in flattened(recipes):
    result = True
print result

Here is the first step towards flatten. It will only do one level of flattening. That is, if the top level dictionary has a dictionary as a value, it will fuse them together. We do this without recursion but it's an important step towards the full solution.

def flatten_one(obj):
    if isinstance(obj, dict):
        # If object is a dictionary, flatten it one level.
        result = {}
        for key, value in obj.items():
            if isinstance(value, dict):
                for subkey, subvalue in value.items():
                    flat_key = '/'.join([key, subkey])
                    result[flat_key] = subvalue
            else:
                result[key] = value
    else:
        # otherwise just return the object
        result = obj

    return result

pprint.pprint(flatten_one(recipes))

Output:

{'apple_pie/cook_time': 60,
 'apple_pie/ingredients': ['flour',
                           'sugar',
                           'eggs',
                           'shortening',
                           'apples',
                           'cinnamon'],
 'apple_pie/prep_time': 30,
 'apple_pie/ratings': {'emily': 8, 'john': 10, 'mom': 4},
 'clam_chowder/allergens': ['shellfish'],
 'clam_chowder/cook_time': 120,
 'clam_chowder/ingredients': ['flour',
                              'milk',
                              'clams',
                              'vegetable_stock',
                              'salt'],
 'clam_chowder/prep_time': 30,
 'peanut_butter_cookies/allergens': ['peanuts'],
 'peanut_butter_cookies/cook_time': 45,
 'peanut_butter_cookies/ingredients': ['flour',
                                       'sugar',
                                       'peanut_butter',
                                       'butter',
                                       'eggs'],
 'peanut_butter_cookies/prep_time': 20}

Now the problem is if there are nested dictionaries, our solution doesn't work. So in this example, the key apple_pie/ratings exists, but the value is still a dictionary. What we'd really like is for deeply nested dictionaries to get pulled up into the top level.

Here is the full implementation:

def flatten(obj):
    if isinstance(obj, list):
    # If object is a list flatten every sub element
        result = []
        for elt in obj:
            flat_elt = flatten(elt)
            result.append(flat_elt)
    elif isinstance(obj, dict):
    # If object is a dictionary, flatten it one level.
        result = {}
        for key, value in obj.items():
            if isinstance(value, dict):
                flat_value = flatten(value)
                for subkey, subvalue in flat_value.items():
                    flat_key = '/'.join([key, subkey])
                    result[flat_key] = subvalue
            else:
                result[key] = value
    else:
        # otherwise just return the object
        result = obj

    return result


print("\n\n=== Flattened ===\n\n")
pprint.pprint(flatten(recipes))

Output:

{'apple_pie/cook_time': 60,
 'apple_pie/ingredients': ['flour',
                           'sugar',
                           'eggs',
                           'shortening',
                           'apples',
                           'cinnamon'],
 'apple_pie/prep_time': 30,
 'apple_pie/ratings/emily': 8,
 'apple_pie/ratings/john': 10,
 'apple_pie/ratings/mom': 4,
 'clam_chowder/allergens': ['shellfish'],
 'clam_chowder/cook_time': 120,
 'clam_chowder/ingredients': ['flour',
                              'milk',
                              'clams',
                              'vegetable_stock',
                              'salt'],
 'clam_chowder/prep_time': 30,
 'peanut_butter_cookies/allergens': ['peanuts'],
 'peanut_butter_cookies/cook_time': 45,
 'peanut_butter_cookies/ingredients': ['flour',
                                       'sugar',
                                       'peanut_butter',
                                       'butter',
                                       'eggs'],
 'peanut_butter_cookies/prep_time': 20}

There! Now we can actually access our recipes in the desired way.

result = False
if 'apple_pie/ratings/john' in flattened(recipes):
    result = True
print result

There you go. A real world lesson in recursion with all the code example fixings!

If you enjoyed this tutorial, sign up for the Python Practice Projects mailing list to get more real world examples for learning Python!

Comment

Python Practice Projects © Louie Dinh Powered by Pelican and Twitter Bootstrap. Icons by Font Awesome and Font Awesome More