Functional Programming in Python

TL;DR: This post is a digested reading of the official documentation regarding functional programming in Python, with additional information about third-party libraries for implementing Monads and Persistent Data Structures.

For quite some time I’ve been having the itch to learn more about the functional programming paradigm in order to gauge its potencial applications in data engineering as is commonly publicides in many blogs around the web that are oriented for ETL processes and data related projects alike.

Recently, I’ve finally delved into its core concepts in order to understand what it is and to understand what is the fuss all about it and if it would fit my personal taste and style of solving problems.

Several google searches and sub-reddits after, and the common trend was to either learn haskell or F# or go towards some semi-functional language like scala or javascript and learn the concepts from there.

I’ve decided to settle for a local optimum: learn the functional concepts in the language I’m most familiar with and see how it goes from there. There reason for this was not linked to lazyness per se on my part, but mostly because finding the time to delve into this topics is a premium activity for me, so might as well learn something that I can apply now at my current job / projects and, if needed, go deeper into the rabbit hole with a pure functional language like haskell or F#.

Having said that, you can definitely apply the core concepts of the functional paradigm in most languages out there, and Python is no exception. Using Python (or any other language that was not designed to be functional from the beginning) has some challenges regarding implementing all the concepts and design patterns available in the wild, but it is doable. But be warned! You’re mileage may very…

In this post we will take a look at the functional programming landscape in Python and focus on showcasing the functional features Python has to offer. Some features do come out-of-the-box, others can be obtained from third-party libraries. The goal here is to demonstrate how we can use Python to write code in a functional way that is not too disruptive or intrusive a person’s imperative workflow.

Functional programming in Python

The core features of the functional side of Python are summarized by the following tools:

  • native features: iterators and generators [2]
  • native libs: itertools; functools [2]
  • Lambda expressions

These are the foundation for building pure functions in Python without compromising speed for the practical side of things (which is hugely important for me).

Also, to be honest, the native features and libs should get you 80% of the way to improve your code readability, testability and reliability without a huge effort. The remaining 20% (mostly containing the edge cases) you will probably need a few extra tools like Monads or persistent data structures, but don’t sweat too much on these details.

Iterators: what are they and what benefits do they bring?

From the official python documentation [2]: “An iterator is an object representing a stream of data; this object returns the data one element at a time. A Python iterator must support a method called next() that takes no arguments and always returns the next element of the stream. If there are no more elements in the stream, next() must raise the StopIteration exception.”

Compared to other data structures like lists or tuples, they have great space and time performance for larger datasets [3]. For small data sets, iterator and list based approaches have similar performance. This makes dealing with larger datasets easy to deal with.

Example of an iter in action:

>>> L = [1, 2, 3]
>>> it = iter(L)
>>> it
<...iterator object at ...>
>>> it.__next__()  # same as next(it)
1
>>> next(it)
2
>>> next(it)
3
>>> next(it)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>>

Data types that support iterators

Lists, tuples, strings and dictionaries can be converted into iterators, and the other way around.

To iterator:

>>> iter([1,2,3])  # list -> iterator
<list_iterator object at 0x7f37444db550>
>>> iter([1,2,3]).__next__()
1

>>> iter((1,2,3))  # tuple -> iterator
<tuple_iterator object at 0x7f37443b6b90>
>>> iter((1,2,3)).__next__()
1

>>> iter("some string")  # string -> iterator
<str_iterator object at 0x7f37443b6c50>
>>> iter("some string").__next__()
's'

>>> iter({"key": "value"})  # dictionary -> iterator
<dict_keyiterator object at 0x7f3744561230>
>>> iter({"key": "value"}).__next__()
'key'

Note: A string iterator iterates over each character of the string. A dictionary iterator iterates over each key of the dictionary

From iterator:

>>> list(iter([1,2,3]))  # iterator -> list
[1, 2, 3]

>>> tuple(iter([1,2,3]))  # iterator -> tuple
(1, 2, 3)

>>> "".join((iter("some string")))  # iterator -> string
'some string'

>>> dict(iter([('key', 'value')]))  # iterator -> dictionary
{'key': 'value'}

What about generators and generator expressions?

Generators are a special class of functions that simplify the task of writing iterators [2]. Regular functions compute a value and return it, but generators return an iterator that returns a stream of values. Generators are defined as a function using the yield keyword:

>>> def generate_ints(N):
...     for i in range(N):
...         yield i
>>> generate_ints(10)
<generator object generate_ints at 0x7f3744390550>

Generator expressions return an iterator that computes the values as necessary, not needing to materialize all the values at once [2].

>>> line_list = ['  line 1\n', 'line 2  \n', ...]

>>> # Generator expression -- returns iterator
>>> stripped_iter = (line.strip() for line in line_list)
>>> stripped_iter
<generator object <genexpr> at 0x7f3744390a50>

>>> # List comprehension -- returns list
>>> stripped_list = [line.strip() for line in line_list]
>>> stripped_list
['line 1', 'line 2']

Python’s built-in functions: map() and filter()

map() and filter() duplicate the features of generator expressions [2].

What map() does is apply a transformation over all elements of a list or array. But instead of materializing the transformatioin results into a list, it returns an iterator for us to consume.

map(f, iterA, iterB, …) returns an iterator over the sequence f(iterA[0], iterB[0]), f(iterA[1], iterB[1]), f(iterA[2], iterB[2]), ….

>>> upper_iter = map(lambda x: x.upper(), ["some", "sentence", "fragment"])
>>> upper_iter
<map object at 0x7f82e600b8d0>
>>> list(upper_iter)  # fetch values into a list
['SOME', 'SENTENCE', 'FRAGMENT']

filter(predicate, iter) returns an iterator over all the sequence elements that meet a certain condition, and is similarly duplicated by list comprehensions. A predicate is a function that returns the truth value of some condition; for use with filter(), the predicate must take a single value.

Example:

>>> upper_iter = map(lambda x: x.upper(), ["some", "sentence", "fragment"])
>>> filtered_iter = filter(lambda x: True if len(x) > 4 else False, upper_iter)  # we can chain iterators
>>> filtered_iter
<filter object at 0x7f82e600bd10>
>>> list(filtered_iter)  # fetch values into a list
['SENTENCE', 'FRAGMENT']

Other useful functions that can be used with iterators are: enumerate(), sorted(), any(), all(), and zip().

More complex operations on iterators with the itertools module

The itertools module contains a number of commonly-used iterators as well as functions for combining several iterators [2]. The module has a core set of extra functionality which can be helpfull for some use cases.

The module’s functions fall into a few broad classes:

  • Functions that create a new iterator based on an existing iterator [2]
  • Functions for treating an iterator’s elements as function arguments [2]
  • Functions for selecting portions of an iterator’s output [2]
  • A function for grouping an iterator’s output [2]

Note: For more information about the itertools module, check the official documentation here.

The functools module

The functools module contains some higher-order functions where it takes one or more functions as input and returns a new function. The most useful tool in this module is the functools.partial() function.

For example, sometimes you may want to construct variants of existing functions that have some of the parameters filled in:

import functools

def log(message, subsystem):
    """Write the contents of 'message' to the specified subsystem."""
    print('%s: %s' % (subsystem, message))

server_log = functools.partial(log, subsystem='server')  # new function with arg filled in by default
server_log('Unable to open socket')

You can always call the server_log() callable like the original function, but you now must specify the key=value of the overloaded argument:

server_log('Unable to open socket', 'server')  # errors out
server_log('Unable to open socket', subsystem='server')  # ok

The constructor for partial() takes the arguments (function, arg1, arg2, …, kwarg1=value1, kwarg2=value2). The resulting object is callable, so you can just call it to invoke function with the filled-in arguments.

Other key method in this library for functional programming is the reduce() method. functools.reduce(func, iter, [initial_value]) cumulatively performs an operation on all the iterable’s elements. Thus, it can’t be applied to infinite iterables.

functools.reduce() takes the first two elements A and B returned by the iterator and calculates func(A, B). It then requests the third element, C, calculates func(func(A, B), C), combines this result with the fourth element returned, and continues until the iterable is exhausted.

Example:

>>> import operator, functools
>>> functools.reduce(operator.concat, ['A', 'BB', 'C'])  # concatenate a list of strings
'ABBC'
>>> functools.reduce(operator.concat, [])  # Errors out
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: reduce() of empty sequence with no initial value

Note: For more information about the functools module, check the official documentation here.

The operator module

As other modules we’ve seen so far, this one provides additional methods like concat, add, gt (greater than), etc., that we can profit from and use in our functional-style map -> filter -> reduce workflow.

Some of the functions in the operator module are:

  • Math operations: add(), sub(), mul(), floordiv(), abs(), …
  • Logical operations: not_(), truth().
  • Bitwise operations: and_(), or_(), invert().
  • Comparisons: eq(), ne(), lt(), le(), gt(), and ge().
  • Object identity: is_(), is_not().

Note: For more information about the operator module, check the official documentation here.

Lambda expressions

Functions can be written as inline operations as lambda expressions for small logical operations. This is useful in cases where you can profit from having a small, one-off operation that by being embeded in a part of your program it improves readability and that can be quite simple to read and understand and does not require a full definition of method for reusability.

>>> adder = lambda x, y: x + y
>>> addert(1,2)
3
>>> print_assign = lambda name, value: name + '=' + str(value)
>>> print_assign('key', 1)
'key=1

Because lambdas are limited in the complexity of functions it can define, it is general advice to prefer methods over lambda expressions whenever possible, and keep the scope of lambda expressions as small as possible and as simple as possible.

Therefore, try to avoid writting complex logic with it.

Persistent data structures in Python

There are no persistent data structures built-in in Python. However, there are a few third-party libraries that provide support for persistent data structures in python, namely:

From these three, pyrsistent is the most feature complete and appears to be somewhat maintained by the devs.

The other two seem to be more of a learning exercise than a full featured library that others can use.

Monads libs

Like persistent data structures, Monads are not part of the core library of Python, but there are a couple of third-party libraries which provide implementation of monad data structures:

PyMonad has an extensive list of implemented monads on its belt and it has been inpired by functional programming languages like haskell and F#, and it provides soli implementations of monad data structures.

OSlash is a library intended more to be a tutorial compared to PyMonad. It does have a few articles about functors, applicatives, monoids and a few monads, like the Reader, Writer and State, so, if you are new to monads and monoids, feel free to take a look at it.

Functional programming and efficiency

Functional programming languages are typically less efficient in their use of CPU and memory than imperative languages such as C and Pascal [4].

This is related to the fact that some mutable data structures like arrays have a very straightforward implementation using present hardware. For example, flat arrays can be accessed very efficiently with CPUs, prefetched efficiently through caches or handled with SIMD instructions.

However, persistent data structures are not sluggish to work with. Yes, there is an operational performance impact in using tree structures for some use cases and operations, but keep in mind that amortized time will reduce the impact of such performance hits, and you will generally see time complexity operations in the order of O(log(n)) [10].

Conclusion

Python has the necessary tools to apply the core principles of functional programming. Using functional programming in Python has the potential for you to be more productive and safe regarding in data related operations, but it is a silver bullet to solve every problem out there in the most efficient way.

Python certainly won’t replace any pure functional language in any way, shape or form when considering computational efficiency, optimization or functionality, but it is doable to write functional code with it.

The necessary blocks for FP are provided out-of-the-box for you to to get things done quickly and succintly. There are plenty of third-party libraries out there that were developed in recent years to help close the gap in terms of features compared to “purer” functional programming languages like Haskell or F# that bring missing functionality for pythonists.

All in all, and introducing my personal bias on to this subject, Python is, in some degree, the defacto generalist language out there for solving a myriad of tasks quickly and, as such, it can be viewed as a “good enough” language to apply functional programming principles to your code without major compromises.

Cheers

Sources and Useful Links

[1] https://github.com/dbrattli/OSlash/wiki/Three-Useful-Monads

[2] https://docs.python.org/3/howto/functional.html

[3] https://markmail.org/message/t2a6tp33n5lddzvy

[4] https://en.wikipedia.org/wiki/Monads_in_functional_programmin

[5] https://github.com/tobgu/pyrsistent

[6] https://pythonhosted.org/pysistence/

[7] https://github.com/kachayev/fn.py

[8] https://github.com/jasondelaat/pymonad

[9] https://github.com/dbrattli/oslash

[10] https://www.scs.stanford.edu/11au-cs240h/notes/performance.html