PicklingError in pyspark (PicklingError: Can't pickle <class '__main__.Person'>: attribute lookup Person on __main__ failed)

I am unable to pickle the below class. I am using data bricks 6.5 ML (includes Apache Spark 2.4.5, Scala 2.11)

import pickle
class Person:
  def __init__(self, name, age):
    self.name = name
    self.age = age
p1 = Person("John", 36)
pickle.dump(p1,open('d.pkl','wb'))```

PicklingError: Can't pickle class '__main__.Person': attribute lookup Person on __main__ failed

Topic pickle azure-ml pyspark apache-spark python

Category Data Science


Instead of defining a Class, try defining a NamedTuple instead.

import pickle
from collections import namedtuple

Person = namedtuple("Person", "name age")

p1 = Person("John", 36)
pickle.dump(p1,open('d.pkl','wb'))

Possible answer from here

The problem is that you're trying to pickle an object from the module where it's defined. If you move the class into a separate file and import it into your script, then it should work.

That solution isn't viable for me in an iPython notebook though. So here I some additional information from here

Python's pickle actually does not serializes classes: it does serialize instances, and put in the serialization a reference to each instance's class - and that reference is based on the class being bound to a name in a well defined module. So, instances of classes that don't have a module name, but rather live as attribute in other classes, or data inside lists and dictionaries, typically will not work.

One straight forward thing one can try to do is try to use dill instead of pickle. It is a third party package that works like "pickle" but has extensions to actually serialize arbitrary dynamic classes.

While using dill may help other people reaching here, it is not your case, because in order to use dill, you'd have to monkey patch the underlying RPC mechanism PySpark is using to make use of dill instead of pickle, and that might not be trivial nor consistent enough for production use.

If the problem is really about dynamically created classes being unpickable, what you can do is to create extra meta-classes, for the dynamic classes themselves, instead of using "type", and on these metaclasses, create proper getstate and setstate (or other helper methods as it is on pickle documentation) - that might enable these classes to be pickled by ordinary Pickle. That is, a separate metaclass with Pickler helper methods to be used instead of type(..., (object, ), ...) in your code.

However, "unpickable object" is not the error you are getting - it is an attribute lookup error, which suggests the structure you are building is not good enough for Pickle to introspect into it and get all the members from one of your instances - it is not related (yet) to the unpickleability of the class object. Since your dynamic classes live as attributes on the class (which is not itself pickled) and not of the instance, it is very well possible that pickle does not care about it. Check the docs on pickle above, and maybe all you need there is proper helper-method to pickle on you class, nothing different on the the metaclass for all that you have there to work properly.

About

Geeks Mental is a community that publishes articles and tutorials about Web, Android, Data Science, new techniques and Linux security.