Idiomatic Python - Adapter Pattern

#programming patterns #chemoinformatics

Writing idiomatic Python (also known as Pythonic code) is a skill that is greatly appreciated, since it improves both readability and maintainability of the code. Using a non-Pythonic ported from C++ I propose you how to adapt it using the Adapter design pattern.

The non-Pythonic API

RDKit is a widely used toolkit to work with molecules. With it you can compute properties, build molecules, fragment them, etc. The core data structures and algorithms are written in C++ and the Python wrappers are built automatically using Boost. So its API has C++ style. Here is a sample of RDKit code in Python:

from rdkit.Chem import MolFromSmiles


celecoxib_smiles = "CC1=CC=C(C=C1)C1=CC(=NN1C1=CC=C(C=C1)S(N)(=O)=O)C(F)(F)F"
celecoxib_mol = MolFromSmiles(celecoxib_smiles)

print("This is the number of atoms that our molecule has:", celecoxib_mol.GetNumAtoms())
print("The symbols of its atoms: ", end="")
for atom in celecoxib_mol.GetAtoms():  # Note this line
    print(atom.GetSymbol(), end="")
print()

print("The atom with the index 0 has the symbol:", celecoxib_mol.GetAtomWithIdx(0).GetSymbol())  # Note this line
print("Our molecule has the following properties:", celecoxib_mol.GetPropsAsDict())  # Note this line
celecoxib_mol.SetBoolProp("Inspected", True)
print("Our molecule has the following properties:", celecoxib_mol.GetPropsAsDict())
$ python adapter-pattern-01.py
This is the number of atoms that our molecule has: 26
The symbols of its atoms: CCCCCCCCCCNNCCCCCCSNOOCFFF
The atom with the index 0 has the symbol: C
Our molecule has the following properties: {}
Our molecule has the following properties: {'Inspected': True}

Note the GetAtoms(), the SetBoolProp() or the GetAtomWithIdx()? In Python there are better ways to do it.

The adapter pattern

For Python, we would like to have a more Pythonic API, with Python idioms and syntactic conventions into the RDKit API. To do so, we can use the adapter pattern. It will wrap the current RDKit classes into Adapter classes which will call the RDKit object methods.

I start by creating an MolAdapter. It will take a string as argument (the SMILES) and will store the RDKit molecule in an attribute. Then I define the first special method: __len__, called when len() is used on the object.

from rdkit.Chem import MolFromSmiles


class MolAdapter(object):

    def __init__(self, smiles):
        self.old_mol = MolFromSmiles(smiles)

    def __len__(self):
        return self.old_mol.GetNumAtoms()


celecoxib_smiles = "CC1=CC=C(C=C1)C1=CC(=NN1C1=CC=C(C=C1)S(N)(=O)=O)C(F)(F)F"
celecoxib_mol = MolAdapter(celecoxib_smiles)
print("This is the number of atoms that our molecule has:", len(celecoxib_mol))
$ python adapter-pattern-02.py
This is the number of atoms that our molecule has: 26

Great! Now I can retrieve the number of atoms calling len() on the molecule adapter.

Going further

Now that I adapt the behaviour that it’s showed in the first snippet of code.

from rdkit.Chem import MolFromSmiles


class MolPropertyTypeError(TypeError):
    pass


class MolAtomIndexError(IndexError):
    pass


class MolProperties(object):

    def __init__(self, mol):
        self.mol = mol

    def __setitem__(self, name, value):
        if isinstance(value, bool):
            self.mol.SetBoolProp(name, value)
        elif isinstance(value, int):
            self.mol.SetIntProp(name, value)
        elif isinstance(value, float):
            self.mol.SetDoubleProp(name, value)
        elif isinstance(value, str):
            self.mol.SetProp(name, value)
        else:
            err = f"Expected types: 'int', 'float', 'str' or 'bool', got '{type(value)}'."
            raise MolPropertyTypeError(err)

    def __getitem__(self, name):
        return self.mol.GetProp(name)

    def __str__(self):
        props = self.mol.GetPropsAsDict()
        items = ", ".join([f"{item}={value}" for item, value in props.items()])
        return f"{self.__class__.__name__}({items})"


class MolAdapter(object):

    def __init__(self, smiles):
        self.old_mol = MolFromSmiles(smiles)
        self.properties = MolProperties(self.old_mol)

    def __len__(self):
        return self.old_mol.GetNumAtoms()

    def __getitem__(self, index):
        if index > len(self):
            raise MolAtomIndexError(f"Atom with index {index} doesn't exists.")
        return self.old_mol.GetAtomWithIdx(index)

    def __iter__(self):
        for atom in self.old_mol.GetAtoms():
            yield atom


celecoxib_smiles = "CC1=CC=C(C=C1)C1=CC(=NN1C1=CC=C(C=C1)S(N)(=O)=O)C(F)(F)F"
celecoxib_mol = MolAdapter(celecoxib_smiles)

print("This is the number of atoms that our molecule has:", len(celecoxib_mol))
print("The symbols of its atoms: ", end="")
for atom in celecoxib_mol:
    print(atom.GetSymbol(), end="")
print()

print("The atom with the index 0 has the symbol:", celecoxib_mol[0].GetSymbol())
print("Our molecule has the following properties:", celecoxib_mol.properties)
celecoxib_mol.properties["Inspected"] = True
print("Our molecule has the following properties:", celecoxib_mol.properties)
$ python adapter-pattern-03.py
This is the number of atoms that our molecule has: 26
The symbols of its atoms: CCCCCCCCCCNNCCCCCCSNOOCFFF
The atom with the index 0 has the symbol: C
Our molecule has the following properties: MolProperties()
Our molecule has the following properties: MolProperties(Inspected=True)

Note that I implement two classes here: MolProperties and MolAdapter. I build the first one because I am already using the __*item__ to treat the adapter as a iterable, so I can’t set and retrieve the properties using celecoxib_mol["Inspected"]: this will try to get an atom with index “Inspected”.

For MolProperties I use a custom object with __*item__ methods, and not a dictionary, because I want it to push those properties to the self.old_mol itself, not retain them in the dictionary. On the MolAdapter I add __getitem__ to get atoms by index, and __iter__ to make the molecule iterable over the atoms. I also create two custom exceptions to handle the case where a user pass a non expected type to properties and the case where a user tries to retrieve a non existent atom by index.

Takeaways

Updates

Further reading

It was useful? Done something similar? Have feedback?