Monday 5 August 2013

Weeks 6-7


Starting off with the good news that I have successfully passed the mid-term evaluation for GSoC! :)

Onwards, as mentioned in the previous post, me and Richard have been working on a "sandbox" application, that works on simulating the communication interface between all the components in the system. That has been our primary focus for the past few weeks and it is slowly evolving into a good "generic" interface, which will hopefully be usable directly in our mm-rest application.

Data Sync and Layered Architecture

The problem of data sync has been our primary concern and focus during this time. Thus far, we have designed and implemented operations for the basic CRUD (Create, Read, Update, Delete) facilities in the application, for both a Locally and Remotely backed up object.


The architecture of the system is layered, where each layer can (and most likely will) be running at different machines with its own separate database, but might also be a local layer, with the data being backed up in the same database. So we had to provide provisions for both of these cases.

It works something like this:

               Inner Layer   <------------- Middle Layer <------------ Outer Layer

Although we can put on as many layers as we like, for each layer, we get a LOT of overhead, in terms of database operations and HTTP requests.

The interface of this architecture has been made a lot more generic, and I hope to fix a few hacks that I have to make it behave like a seamless library for any locally and remotely backed models.

This layering, although present in the system from the beginning, has been made a little more explicit, with each layer having a relation with the one below it. For each model, there exists a mirror copy at each layer (all of the 3 layers can have some different fields, but each adjacent layer should have at least one common field which can be used to query).


For the locally backed objects, that field is a OneToOne or ForeignKey relation. For the remotely backed objects, its a partial URL unique to the related object, which is generated and creates the relation when the object is created.

The problem at this point is that due to the machinery present in Django and DRF, we are currently making a lot more save operations, and even more problematic, more HTTP requests.

Apart from that, there are use cases which haven't been looked at properly yet, for example, the case where we might have to propagate immediate updates to the upper layers. For now, this sync will only behave as a *cache* which is updated if the current layer has nothing in it, or if there is some other criteria being satisfied (object expired because it's too old, periodic cache update etc).

Diving into the ORM

I spent a lot of the past couple of weeks fighting against the Django REST Framework and how it the behavior I desired was not working because of something the DRF handles internally. The major problem in that regard was filtering objects and getting ALL objects at once.

One the surface, the two look like separate behavior functions that should be easy to customize/extend. It was not quite as easy, since overriding all() was proving to be very difficult and causing my tests to fail. This happened because internally, DRF made a call to the all() function for every data serialization that happened.

So instead of hacking all(), we had to go around and change the DRF Viewsets and the DRF filtering backend itself, which is another library known as django-filter, to use querysets directly and not call all() unless they absolutely *have* to.

This made things easy, as now, we could just define all() in terms of a filter() that filters nothing. :)

During the course of struggling against all(), I learned quite a few things about the Django ORM, how the QuerySets work, how to override them, and how to easily make them work with the Managers. In order to maintain the DRY principal, I also searched around for a hack (there is nothing explicit in the ORM regarding this), and ended up using django-model-utils, an awesome little library that defines a few convenient model utilities for Django.

One of them, PassThroughManager, made it easy to define everything in the QuerySets itself, and took care of defining things in the Manager itself.

Other than that, I spent the time writing tests, finishing off the design documentation for our basic operations and did the Rocky movie marathon on Sunday! :)

No comments:

Post a Comment