Recently I was converting some old Python2 code to Python3 and I ran across a problem pickling and unpickling.
I guess I would say it wasn’t a major problem because I found the solution fairly quickly with a bit of googling around.
Still, I think the problem and its solution are worth a quick note. Others will stumble across this problem in the future, especially because there are code examples floating around (in printed books and online posts) that will lead new Python programmers to make this very same mistake.
So let’s talk about pickling.
Suppose you want to “pickle” an object — dump it to a pickle file for persistent storage.
When you pickle an object, you do two things.
- You open the file that you want to use as the pickle file. The open(…) returns a file handle object.
- You pass the object that you want to pickle, and the file handle object, to pickle.
Your code might look something like this. Note that this code is wrong. See below.
fileHandle = open(pickleFileName, "w") pickle.dump(objectToBePickled, fileHandle)
When I wrote code like this, I got back this error message:
Pickler(file, protocol, fix_imports=fix_imports).dump(obj) TypeError: must be str, not bytes
Talk about a crappy error message!!!
After banging my head against the wall for a while, I googled around and quickly found a very helpful answer on StackOverflow.
The bottom line is that a Python pickle file is (and always has been) a byte stream. Which means that you should always open a pickle file in binary mode: “wb” to write it, and “rb” to read it. The Python docs contain correct example code.
My old code worked just fine running under Python2 (on Windows). But with Python3′s new strict separation of strings and bytes, it broke. Changing “w” to “wb”, and “r” to “rb”, fixed it.
One person who posted a question about this problem on the Python forum was aware of the issue, but confused because he was trying to pickle a string.
import pickle a = "blah" file = open('state', 'w') pickle.dump(a,file)I know of one easy way to solve this is to change the operation argument from ‘w’ to ‘wb’ but I AM using a string not bytes! And none of the examples use ‘wb’ (I figured that out separately) so I want to have an understanding of what is going on here.
Basically, regardless of the kind of object that you are pickling (even a string object), the object will be converted to a bytes representation and pickled as a byte stream. Which means that you always need to use “rb” and “wb”, regardless of the kind of object that you are pickling.
