A Python Functional Programming Example
So I was working on my ref-man emacs package and I had made some additions which incorporate a local python flask server running on it which fetches data from from dblp API, arXiv API and also I’m adding semantic scholar support. Originally there was only dblp calls so I had to modify the functions and I took a functional programming approach, by generating partial functions and using them to modify the existing code as I modified my existing implementation.
The old dblp function was dblp_old which is still in the file
def dblp_old():
# checking request
if not isinstance(request.json, str):
data = request.json
else:
try:
data = json.loads(request.json)
except Exception:
return json.dumps("BAD REQUEST")
j = 0
content = {}
# The requests are fetched in parallel in batches which can be configured
while True:
_data = data[(args.batch_size * j): (args.batch_size * (j + 1))].copy()
for k, v in content.items():
if v == ["ERROR"]:
_data.append(k)
if not _data:
break
q = Queue()
threads = []
for d in _data:
threads.append(Thread(target=dblp_fetch, args=[d, q],
kwargs={"verbose": args.verbose}))
threads[-1].start()
for t in threads:
t.join()
content.update(_dblp_helper_old(q))
j += 1
return json.dumps(content)So a function dblp_fetch is given to the threads with the data and a queue as args. All it’ll do is push responses to HTTP requests on to the queue and a helper function _helper will process the results after fetching from the queue.
def _helper(q):
content = {}
while not q.empty():
query, response = q.get()
if response.status_code == 200:
result = json.loads(response.content)["result"]
if result and "hits" in result and "hit" in result["hits"]:
content[query] = []
for hit in result["hits"]["hit"]:
info = hit["info"]
authors = info["authors"]["author"]
if isinstance(authors, list):
info["authors"] = [x["text"] for x in info["authors"]["author"]]
else:
info["authors"] = [authors["text"]]
content[query].append(info)
else:
content[query] = ["NO_RESULT"]
elif response.status_code == 422:
content[query] = ["NO_RESULT"]
else:
content[query] = ["ERROR"]
return contentFairly simple as you can see, but now if I want to add more APIs, I either write separate _fetch and _helper functions for each of them or modify the existing interface. Adding new functions is easier and hackier but error prone in the long run because:
- You’ll end up copy pasting the code which will lead to errors
- Changes in the implementation where that code interacts with the server will have to be replicated across
So I changed the functions in three ways:
Removed the multithreaded part so that the functions is executed parallely and responses handled by
_helperfor arbitrary functions. I could have used a threadpool or processpool (See), but I already had the threads implementation and it was working fine.q_helpernow takes three functions as parameters in addition to the queuedef q_helper(func_success, func_no_result, func_error, q): content = {} while not q.empty(): query, response = q.get() if response.status_code == 200: func_success(query, response, content) elif response.status_code == 422: func_no_result(query, response, content) else: func_error(query, response, content) return contentSo it’s entirely indpendent of whether we fetch from dblp or arxiv or any other source. It doesn’t even handle any of the responses itself but passes them on to the functions. Data is shared with the
dictcontent. But that leaves us with a tiny problem: thehelperfunction inpost_json_wrapperonly takes theqas an argument while our newq_helperrequires 3 functions and theq.Now there are two ways to handle that: we can either pass all the functions to the
post_json_wrapperor pass a new function which already has the three functions set. After all there will be a separate helper for each API. So this is where the final piece of the functional puzzle falls in._dblp_helper = partial(q_helper, _dblp_success, _dblp_no_result, _dblp_error)what we’re doing is defining a new function composed of existing functions
q_helper,_dblp_successetc. but with everything but theqalready set.arxiv_helperis defined similarly and results in fewer paramters being sent topost_json_wrapperwhich is almost always a good thing. See functools module for information aboutpartial.
So we’ve moved from an API specific threaded function and helper to an API agnostic one while still keeping the original ideas of the implementation intact! Pretty neat! I might write about how functional programming can help reduce software complexity in another post, as even though it looks complex you should keep in mind that most of the functions are stateless. They don’t change over time. That leaves us with fewer things to worry about.
