Wednesday 21 August 2013

Weeks 8-10


Progress has been slower than I expected these past few weeks. I have been working on the integration of the architecture and the layer design that was prototyped in the sandbox application into the public_rest code. The result wasn't as neat as I had hoped, and there were quite a few considerations that had to be taken in the context of the MM-Core API.

The primary problem during this duration has been the "transformation" of data. The way I have gone about it, which, admittedly can be improved a lot, is to use the former mailman.client classes as "data adaptors" which are objects that wrap around the data returned from core. These adaptors are used by the CoreInterface to perform the necessary CRUD operations on the REST API, and are associated with the local models as well.

I was then able to create some generic interface functions that use dispatch functions for each model, and return adaptors.

Certain models like Memberships are actually exposed as multiple endpoints at the REST level. A Membership at `mm-rest` is an object with either a moderator, member or owner as "role". But at Core, we have separate endpoints for all three of them. So those three are combined at the Interface (arguably, this job should be of the data adaptor). So while we query the memberships, we create our adaptor objects using all 3 endpoints and return the Adaptor list.

During the working of the prototype, I was using a uniform REST API that was exposed using the Django-Rest-Framework, which made things easy. This wasn't the case in Core, since the REST API is not as consistent with its terminology and access to primary and secondary resources.

Some entities are models unto themselves at the mm-rest layer, but are considered secondary resources at the API. Case in point: List Settings. Each MailingList model has a foreign key that is associated with a ListSettings model. The model follows the same set of operations when it comes to performing the CRUD functions at the mm-rest layer, but, at core, it is exposed as part of the list settings, via `/lists/{fqdn_listname/config` endpoint.

To handle such inconsistencies, I pass around an "object_type" parameter, which creates resources based on the type of the object. It is not an elegant solution, but it works for my purpose and I am keeping it for now.

There have also been problems regarding some things which the Core simply does not allow me to do (like creating email addresses), so those are on hold for now, and will be (possibly) addressed after GSoC is over.

For now, I am able to handle saves for primary objects like Domain, MailingLists, and Memberships in both local and core. The job for this week is to finish the other entities and work on "filter" (and "get" as an extension) as well. Once that is complete, I think implementing User-authentication and a consistent design for our own DRF will be the only things to be done in this project.

One month left in GSoC. Time is passing quickly!

Monday 5 August 2013

Weeks 6-7


Starting off with the good news that I have successfully passed the mid-term evaluation for GSoC! :)

Onwards, as mentioned in the previous post, me and Richard have been working on a "sandbox" application, that works on simulating the communication interface between all the components in the system. That has been our primary focus for the past few weeks and it is slowly evolving into a good "generic" interface, which will hopefully be usable directly in our mm-rest application.

Data Sync and Layered Architecture

The problem of data sync has been our primary concern and focus during this time. Thus far, we have designed and implemented operations for the basic CRUD (Create, Read, Update, Delete) facilities in the application, for both a Locally and Remotely backed up object.


The architecture of the system is layered, where each layer can (and most likely will) be running at different machines with its own separate database, but might also be a local layer, with the data being backed up in the same database. So we had to provide provisions for both of these cases.

It works something like this:

               Inner Layer   <------------- Middle Layer <------------ Outer Layer

Although we can put on as many layers as we like, for each layer, we get a LOT of overhead, in terms of database operations and HTTP requests.

The interface of this architecture has been made a lot more generic, and I hope to fix a few hacks that I have to make it behave like a seamless library for any locally and remotely backed models.

This layering, although present in the system from the beginning, has been made a little more explicit, with each layer having a relation with the one below it. For each model, there exists a mirror copy at each layer (all of the 3 layers can have some different fields, but each adjacent layer should have at least one common field which can be used to query).


For the locally backed objects, that field is a OneToOne or ForeignKey relation. For the remotely backed objects, its a partial URL unique to the related object, which is generated and creates the relation when the object is created.

The problem at this point is that due to the machinery present in Django and DRF, we are currently making a lot more save operations, and even more problematic, more HTTP requests.

Apart from that, there are use cases which haven't been looked at properly yet, for example, the case where we might have to propagate immediate updates to the upper layers. For now, this sync will only behave as a *cache* which is updated if the current layer has nothing in it, or if there is some other criteria being satisfied (object expired because it's too old, periodic cache update etc).

Diving into the ORM

I spent a lot of the past couple of weeks fighting against the Django REST Framework and how it the behavior I desired was not working because of something the DRF handles internally. The major problem in that regard was filtering objects and getting ALL objects at once.

One the surface, the two look like separate behavior functions that should be easy to customize/extend. It was not quite as easy, since overriding all() was proving to be very difficult and causing my tests to fail. This happened because internally, DRF made a call to the all() function for every data serialization that happened.

So instead of hacking all(), we had to go around and change the DRF Viewsets and the DRF filtering backend itself, which is another library known as django-filter, to use querysets directly and not call all() unless they absolutely *have* to.

This made things easy, as now, we could just define all() in terms of a filter() that filters nothing. :)

During the course of struggling against all(), I learned quite a few things about the Django ORM, how the QuerySets work, how to override them, and how to easily make them work with the Managers. In order to maintain the DRY principal, I also searched around for a hack (there is nothing explicit in the ORM regarding this), and ended up using django-model-utils, an awesome little library that defines a few convenient model utilities for Django.

One of them, PassThroughManager, made it easy to define everything in the QuerySets itself, and took care of defining things in the Manager itself.

Other than that, I spent the time writing tests, finishing off the design documentation for our basic operations and did the Rocky movie marathon on Sunday! :)