Ordinal Not in Range
Lastmod: 2023-01-26

Overview

Example errors:

Traceback (most recent call last):
  File "unicode_ex.py", line 3, in
    print str(a) # this throws an exception
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa1' in position 0: ordinal not in range(128)

This issue happens when Python can’t correctly work with a string variable.

Strings can contain any sequence of bytes, but when Python is asked to work with the string, it may decide that the string contains invalid bytes.

In these situations, an error is often thrown that mentions ordinal not in range, or codec can't encode character, or codec can't decode character.

Here’s a bit of code that may reproduce the error in Python 2:

a='\xa1'
print(a + ' <= problem')
unicode(a)

Initial Steps Overview

  1. Check Python version

  2. Determine codec and character

Detailed Steps

1) Check Python version

The Python version you are using is significant.

You can determine the Python version by running:

python --version

or, if you have access to the running code, by logging it:

print(sys.version)

The major number (2 or 3) is the number you are interested in.

It is expected that you are using Python2.

2) Determine interpreting codec and character

Get this from the error message:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xa1' in position 0: ordinal not in range(128)

In this case, the code is ascii and the character is the hex character A1.

What is happening here is that Python is trying to interpret a string, and expects that the bytes in that string are legal for the format it’s expecting. In this case, it’s expecting a string composed of ASCII bytes. These bytes are in the range 0-127 (ie 8 bytes). The hex byte A1 is 161 in decimal, and is therefore out of range.

When Python comes to interpret this string in a context that requires a codec (for example, when calling the unicode function), it tries to ‘encode’ it with the codec, and can hit this problem.

3) Determine desired codec

You need to figure out how the bytes should be interpreted.

Most often in everyday use (eg web scraping or document ingestion), this is utf-8.

Once you have determined the desired codec, solution A may help you.

Solutions List

A) Decode the string

Solutions Detail

A) Decode the string

If you have a string s that you want to interpret as utf-8 data, you can try:

s = s.decode('utf-8')

to re-encode the string with the appropriate codec.

Further Information

Owner

Ian Miell

comments powered by Disqus