Python: dictionaries

a map from a set of keys to a set of values

Python Dictionaries

In Python, we can use a dictionary to associate keys with values.

This code creates a simple dictionary called en_to_es (short for “English to Español), that maps the words one, two and three (as Python strings) to their Spanish counterparts (as Python strings):

en_to_es = { 'one' : 'uno', 'two' : 'dos', 'three' : 'tres' }

Once you create a dictionary, you can access the values by looking up their key. Here, we show trying some Python dictionary code at the interactive Python shell:

>>> en_to_es = { 'one' : 'uno', 'two' : 'dos', 'three' : 'tres' }
>>> en_to_es['one']
'uno'
>>> en_to_es['three']
'tres'
>>> 

If a particular key is not in the dictionary, and you try to look it up, you get a KeyError, like this:

>>> en_to_es['ten']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'ten'
>>> 

Handling KeyError with try/except

You can use a so-called try/except block to write custom code that looks for the KeyError and instead of printing a scary looking error message, does whatever you would prefer:

Suppose we run this file:

def translate(myDictionary,wordToLookup):
    ''' lookup word.  return NoneType value if word not found '''
    try:
        return myDictionary[wordToLookup]
    except KeyError:
        print "The word ",wordToLookup," was not in the dictionary"
        return


en_to_es = { 'one' : 'uno', 'two' : 'dos', 'three' : 'tres' }

Then, we can use the function translate to do translation with a “nicer” error message.

=============== RESTART: /Users/pconrad/Documents/translate.py ===============
>>> en_to_es
{'three': 'tres', 'two': 'dos', 'one': 'uno'}
>>> en_to_es['one']
'uno'
>>> en_to_es['ten']

Traceback (most recent call last):
  File "<pyshell#2>", line 1, in <module>
    en_to_es['ten']
KeyError: 'ten'
>>> translate(en_to_es,'one')
'uno'
>>> translate(en_to_es,'ten')
The word  ten  was not in the dictionary
>>> 

Dictionaries of Dictionaries

Two letter language codes for various human spoken languages have been standardized by the International Standards Organization, ISO, and can be looked up at this web page)

Here are just a few. The third column explains why, for example, German is de and Chinese is zh.

code Language (in English) Language (in that language,
using latin alphabet)
de German Deutsch
en English English
es Spanish Español
fr French Français
fa Persian Farsi
zh Chinese Zhongwen

We could construct a dictionary of these codes like this:

codeToLanguage = {
  'de' : 'German',
  'en' : 'English',
  'es' : 'Spanish',
  'fr' : 'French',
  'fa' : 'Persian',
  'zh' : 'Chinese',
}

If we wanted to translate the number ‘one’,’two’,’three’ into each of these languages, we could create six different dictionaries, like this:

en_to_de = { 'one' : 'eins', 'two' : 'zwei', 'three' : 'drei' }
en_to_es = { 'one' : 'uno', 'two' : 'dos', 'three' : 'tres' }
en_to_fr = { 'one' : 'un', 'two' : 'deux', 'three' : 'trois' }
# etc ...

But there is a better way. It turns out that the value part in a (key,value) pair can be any type; not just a string. One possibility is that it can, itself, be a dictionary.

This gives us many ways that we could construct a dictionary to translate one,two,three into various languages.

Method 1: Use English number as key, then each entry is a dictionary by language:

numDict = { 
    'one':   {'de' : 'eins', 'es':'uno',  'fa':'yek', 'fr':'un',    'zh':'yi' },
    'two':   {'de' : 'zwei', 'es':'dos',  'fa':'do',  'fr':'deux',  'zh':'er' },
    'three': {'de' : 'drei', 'es':'tres', 'fa':'seh', 'fr':'trois', 'zh':'san' }
  }

In this dictionary, writing numDict['one'] returns to us another dictionary, where the keys are the language codes, ('de', 'es', 'fr', etc.). For example:

>>> numDict['one']
{'fa': 'yek', 'fr': 'un', 'de': 'eins', 'zh': 'yi', 'es': 'uno'}
>>> 

Note that in a Dictionary, the keys don’t necessary appear in the order that we put them into the dictionary, and they don’t even necessarily appear in any particular order. In technical terms, we say that:

So, to get 'one' in a particular language, we can index the dictionary returned by numDict['one'] a second time, with the language that we want. For example, to get 'one' in Chinese ('zh'), we can use:

>>> numDict['one']['zh']
'yi'
>>> 

But that’s not the only way to do it.

Method 2: Index by language, then by word

We could also arrange our dictionary like this:

numberLookup = {
    'de' : { 'one' : 'eins', 'two' : 'zwei', 'three': 'drei'  },
    'es' : { 'one' : 'uno',  'two' : 'dos',  'three': 'tres'  },
    'fa' : { 'one' : 'yek',  'two' : 'do',   'three': 'seh'   },
    'fr' : { 'one' : 'un',   'two' : 'deux', 'three': 'trois' },
    'zh' : { 'one' : 'yi',   'two' : 'er',   'three': 'san'   }
}

Now, when we use numberLookup['es'] for example, we get a dictionary indexed by the numbers in English, 'one', 'two', 'three'.

>>> numberLookup['es']
{'three': 'tres', 'two': 'dos', 'one': 'uno'}
>>> 

So, to lookup a particular, number, we add a second index:

>>> numberLookup['es']['two']
'dos'
>>> 

Why are dictionaries important?

Dictionaries are one of the most commonly used data structures in “real world” Python programming, because they correspond very nicely to the way that real world data is often structured. This includes data from data bases, websites, etc.

There is a notation called JSON, which stands for JavaScript Object Notation. Although this notation comes from the language JavaScript, it is used across many languages other than JavaScript, including Python.

Many websites and other data services provide access to data in JSON format.

For example, check out the links provided below. The first link is a data about Coca Cola Coke: https://world.openfoodfacts.org/product/4890008100309/coca-cola-coke The second link is a JSON format of the first link: https://world.openfoodfacts.org/api/v0/product/4890008100309.json

Both links are describing the same item, but written in different ways. You can confirm that it is the same item by looking at the UPC number. In this case, it is ‘4890008100309’.

Applying Python Dictionaries with Real JSON Data

Data in JSON format can be easily converted into a Python dictionary. For our example, we will use five different soft drink from the Open Food Facts API. The JSON for these five soft drinks are: Coca Cola Coke, Pepsi, Sprite, Fanta Orange, Dr. Peper.

To learn how to extract JSON data from the web into a Python dictionary, check out this link: Python:JSON

As you may have noticed looking at the JSON data, each soft drink has 4 main keys: “code”, “product”, “status_verbose”, and “status”. However, we do not want to see all of these big data everytime you access these items. Hence, we can easily create a new Python dictionaries consisting of information that we really care about.

As you may have learned already, the key and the value for Python Dictionary can be any type of variable. In our case, the 4 main keys that we just mentioned are type str. The values that are paired with those keys are also type str, except for the value paired with the key product. The value for the key product is a dictionary consisting more key/value pairs (which can still be a single data value, an array, or another dictionary). This example of a dictionary within a dictionary is called a nested dictionary.

Assume that ‘cola_data’, ‘pepsi_data’, ‘sprite_data’, ‘fanta_data’, ‘dr_pepper_data’ are all dictionaries that have been extracted from the JSON data in Open Food Facts API, and the code will intially have an array consisting all of these dictioanries.

1
array_of_data = [cola_data, sprite_data, fanta_data, dr._pepper_data]

Let’s try to create a Python Dictionary only consisting of the item’s UPC number as the key and the item’s product name as the value, which what this code does:

1
2
3
4
5
array_of_data = [cola_data, sprite_data, fanta_data, dr_pepper_data, pepsi_data]
list_of_UPC_number = {}
for items in array_of_data:
   list_of_UPC_number[items['code']] = items['product']['product_name']  
pprint(list_of_UPC_number)

This is what the above code prints out:

{'4060800100252': 'Pepsi',
 '4890008100309': 'Coca Cola Coke',
 '5449000014535': 'Sprite',
 '8435185944009': 'Dr Pepper',
 '8847100560094': 'Fanta Orange'}

Let’s discuss the code line by line.

Using the Python Dictionary Representation of the JSON Object to Answer Questions

Suppose that there is a given code such that array_of_data represents an array of Python Dictionaries that represent the JSON data associated with these soft drink products: Coca Cola Coke, Pepsi, Sprite, Fanta Orange, Dr. Peper. Similiar to the sample Python code above, there is a given Python dictionary called soft_drinks_dict, and each soft drink in array_of_data will be iterated as item.

Write a Python expression that would answer these questions.

  1. Q: What would you need in the for-loop in order to get the product name as a key and the UPC number of that product as a value?

    A: soft_drinks_dict[item['product']['product_name']] = items['code']

  2. Q: What would you need in the for-loop in order to get the product name as a key and the amount of in one serving as a value?

    A: soft_drinks_dict[item['product']['product_name']] = item['product']['nutriments']['sodium']