Looking for Best Practice for Raw SQL

DVD Cover from Eddie Murphy - RAW
Not Quite This Raw

My application includes some existing tables from another, non-Django application. One of them does not have an atomic primary key, and I’m not in a position to change that. Therefore, I’m using one raw SQL to access the data.

 

Where I run into trouble is that the query returns data in columns, but not objects. That means that I can’t reference an object attribute using dot notation in my templates or views. Instead, I have to include each attribute that will be needed as a query column. This will limit what I or someone else can do in templates later without a mod to the query.

In an earlier application with a similar situation, I took the results from the query and built a dictionary of objects. Clumsy, hard-coded, and likely slow, but it did give me what I needed.

I’ve looked around the web, but haven’t found any practices to follow, so I hope my many readers (smirk) can offer some help.

Have you tackled this dilemma in the past? Or maybe read where someone else has discussed a possible solution? (I’ll also take wild guesses). Please leave a comment below or drop me a line. I’ll summarize the responses and my solution(s) in a future post.

Thanks!!

UPDATE:

Here’s how I’m doing it now:

def available_classes_query(username=None):
    """Run a simple query and produce a generator
    that returns the results as a bunch of dictionaries
    with keys for the column values selected.
    """
    from itertools import izip
    from django.db import connection

    query_string = """select tu.id as traininguser, cl.id as trainingclass, ts.id as trainingschedule
                                 -- big chunk of SQL cut for readability --"""

    cursor = connection.cursor()
    cursor.execute(query_string, [username])  ## parameter for query
    col_names = [desc[0] for desc in cursor.description]
    while True:
        row = cursor.fetchone()
        if row is None:
            break
        row_dict = dict(izip(col_names, row))

        try:
            row_dict['traininguser'] = TrainingUser.objects.get(id=row[0])
        except TrainingUser.DoesNotExist:
            row_dict['traininguser'] = None
        try:
            row_dict['trainingclass'] = TrainingClass.objects.get(id=row[1])
        except TrainingClass.DoesNotExist:
            row_dict['trainingclass'] = None
        try:
            row_dict['trainingschedule'] = TrainingSchedule.objects.get(id=row[2])
        except TrainingSchedule.DoesNotExist:
            row_dict['trainingschedule'] = None

        yield row_dict
    return

ANOTHER UPDATE:

I posted this question on the Django Users group of Google Groups, and the very obvious answer was suggested – use the .raw() method of the manager.  This was added in Django 1.2, and I remember reading about it.  The RawQuerySet returned by this method will contain objects – even though the SQL provided will only return columns from the row.  Pretty slick.

The reason I didn’t find this is because I’m still using version 1.0.4, so I limited my searches to the docs from that version.  Yet one more reason to upgrade.

Base64 Encoding of Images

Today I’m working on an XML feed of events into an enterprise calendar program.  One of the requirements is the base64 encoding of accompanying images. My system gives the option to store the URL to a photo, not to store the image locally.  A bit of a shortcut I know, but I didn’t want to mess with the bureaucracy of the server team to work out the storage a backup of the images.  Luckily, it’s no big deal to Python either way. For a locally stored images, one can convert to base64 pretty easily, given the file name.  Try this:

import base64

image = 'example.jpg'
image_64 = base64.encodestring(open(image,"rb").read())

Not much different working with a URL, as urllib provides the urlopen method to present an online resource with a .read() function:

import base64, urllib

image = urllib.urlopen('http://docs.python.org/_static/py.png')
image_64 = base64.encodestring(image.read())

In this example, I’m using the location of the little Python symbol on the top left of most python.org pages.  Even this small graphic converts to a lot of base64 characters.  Here is the result:

iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAAAAXNSR0IArs4c6QAAAAZiS0dEAP8A
/wD/oL2nkwAAAAlwSFlzAAALEwAACxMBAJqcGAAAAAd0SU1FB9gEGxE4IQYzJ14AAAI3SURBVDjL
ZZNPSFVBFIe/e9+zd3silBCl0SZoU4s2rVq0EB5tQip4UNvATVGu3QRBiyAi2iltWkgbF5EgRhFF
RpiWtrWIzDIV1Pzz7p15M2fmtvDevOqBw8DM9zvnN8ycgF3R/eDtM2mac96ZdrFNxBikqbRV+vHH
/ut9gAZczoe7C3gnF0f6au1OLM5avFi8d1Ea+JvAMSAq8nsKOGs5f2cYJ3Y7rc2PO4BqkS8DdD98
f9tbe1ysCoxOBo1qlEXHJWcM4b5KPU19zleA0o4Clx99eO3EdqVewHsCoFRugUoVghJO7A6H6Vx9
wdtYi27cr5x6dy/03nVtWTU7bWeZh6jNUcAiCaFTURl9A+gs56AviHzh3mnqtdPxm6knfQPLU7Ua
okASQq/agY7yDrG16Mba6Pz48NP56VdrgAApYObGaicPtkovToFLQBKA/WUxTe3FRk4san15aGKg
d3Dj560rrdGJS6FT0X9YYvLuiMKL1kAQOpHZ3PqfyZfP41+9PW1VfzX0RXFSECfgNEmSTgImdDru
F2O0E8vvqZG1auQubAsKooIYYHpGvwA2g+xndQBHgWa6cG0ih5cW/w6VvEq3nChwCoBvs+bL2Z7V
ceBHGTDAIrABpMVuhw+4OiLgLIglOLPYBTQAlfErIeCzjRVg1dtEb1kt5Omv+DTV2YssAN+zNdkz
C42N9brV8WdvYp07seOdM2Of1F3AAknW0AJpwN6IgEPAEaANaMlcbmZdl7KRBuAfAb+v//yMAJoA
AAAASUVORK5CYII=

My application doesn’t put any limitations on the size of the image being linked, so I may have to add logic to resize it to something sensible for the receiving program.  I surely don’t need a 5MB image in my XML stream!  Watch this space for updates. Finally, I need to put this converted data into the XML output.  Template tags to the rescue! In tempplatetags/filters.py:

@register.filter
def b64(url):
    import base64, urllib
    if url:
        try:
            i = urllib.urlopen(url)
            return base64.encodestring(i.read())
        except:
            return None
    return None

This tag will try to open the url and convert the contents.  Not much error checking here.  If it doesn’t work, the value of None is returned. An this is an example template file, called ucal.xml:

{% load filters %}<!--?xml version="1.0" encoding="utf-8"?-->

...

{% if e.photo_link %}{{ e.photo_link|b64 }}{% endif %}

(Note that the {% load filters %} directive is on the same line as the xml version declaration.) The b64 filter takes the supplied URL and returns the base64 encoded version.

UPDATE:

Learned that the calendar can also just take a URL to the photo as input. Oh well, at least I got to learn something new.

Change of Plan

I’ve been working on a project (hereafter called ‘The Project’) for almost two years, although not much lately.  When I switched the platform from a customer hosted solution to a SAS application, the complexity increased greatly, and the estimated time to finish (if I were actually working on it) has extended past any acceptable time frame.  Plus, I’m finding more and more companies that have already put up applications to fill this need – plenty of competition.

I have interests in other Django tasks, including a possible reusable application, conversion of previous code to a newer Django version, and a personal project.  However, my guilt on not focusing on The Project has kept me from these tasks.

Therefore, I am officially announcing that The Project is being put aside – maybe temporarily, maybe forever.  I’ll certainly keep the code around – lots of useful snippets there, but no more active development.

The Project is dead, long live the new projects!

Fun with Oracle and External Data

I’m playing around today with Django and Oracle, connecting my application to data from another application.

First thing I realized is that I cannot use a table or view that does not have a single column primary key.  Without a primary key, the Django Admin system complains loudly.  For those that do have a good key, it’s pretty easy to declare it to Django:

id_number = models.CharField(max_length=10,primary_key=True)

This will override Django’s default behavior of using ‘ID’ as the primary key field name.

Next, there’s a specific way that one declares the schema and table/view name for the existing structure.  Each has to be in quotes, with the entire name also in quotes.  Something like this:

    class Meta:
        db_table = u'"SCHEMA"."TABLE"'

Something else that I needed to do was to not allow changes to the external data.  This is enforced by Oracle security, but I also wanted to make it clear in my code that saves and deletes are not going to happen.  To do this, I create an abstract class upon which my other models are based, which includes the redefined save() and delete() methods that raise a NotImplemeted error.  These produce a nice, big error page when DEBUG=TRUE.

class AdvanceBase(models.Model):
    """
    Abstract base class for accessing tables and views in the external schema
    """

    class Meta:
        abstract = True

    def save(self): # No saving allowed
        raise NotImplementedError

    def delete(self): # No deletes allowed
        raise NotImplementedError

I decided to try setting up a relationship between two of the external tables.  This wasn’t really different than any other foreign key, with one exception.  I wanted to rename the field to follow Django conventions.  Orginally called SCHOOL_CODE, I changed it to SCHOOL – same as the foreign key object.  In order to still map to the correct DB column, the db_column attribute is used to declare the original name:

school = models.ForeignKey(School,blank=True,null=True,db_column='SCHOOL_CODE')

One more experiment – using a manager.  In this case I wanted to filter a table to include only living people.   The fact that I was using an external table did not change the method for this:

class PersonManager(models.Manager):
    def get_query_set(self):
        return super(PersonManager, self).get_query_set().filter(status='A')

Something else I learned was that the Django Admin will use the first manager listed for a model.  By placing the reference to the PersonManager ahead of the normal objects, the Admin only showed living people.

I’m sure there will be more later as I dig deeper into using external data.  I have an actual application related goal driving my interest, so I should get this solved soon.

As always, leave your comments below.

Applications, Models, Properties, and Circular Imports

I’ve been fighting a problem in my Django project for a while, and the solution I came up with seemed like I was settling for practicality over elegance.  This post is an attempt to rationalize my choices.

First, I love the idea of reusable applications.  I have incorporated a few from open source, including django photologue for the gallery, and the ease of use was amazing.  While I know that my work may not be general enough for public use, I do hope to be able to reuse some code in different parts of my project.  And who knows, maybe something useful to the world will come from it.

So, I divided my project into what seemed to be mostly independent applications, and started coding.

My problem came when I started heavy use of the @property decorator in my models.  Being able to reference a function using the dot notation is very handy (even though they aren’t available in functions that reference fields internally – like the ORM).  However, I found myself referencing models and fields in these two certain applications often, and I ended up with a circular import error.

I’d like to take a moment to point out that my Internet search on circular imports did not yield much help.  Several sites merely pointed out that a circular import was the result of poor coding, but offered no advice.  Finally, one post suggested moving the import to the end of models.py.  This solved the problem in my laptop development environment, but the same code would not run on my Dreamhost server (maybe it is importing in a different order).

Next, I tried to rewrite my property functions to reference the models from the other application only through the ORM – to avoid the need for an import.  This helped on the less complicated functions, but I couldn’t do it for the really useful, complicated ones.

My solution was to combine these two applications together.  The two are obviously very connected, and it was easy to make the case.  However, one of them could have been used in a more general setting, and now it is too specific to be presented to the public.  I guess I’ll have to earn my open source cred with something else.

The important thing here is that I am past this and am once again able to work on features instead of back end problems.

Am I being too theoretical? Are properties not the way to go? The comment section is waiting for your thoughts.

Django – Custom Queryset with a Formset

My goal was to apply a custom queryset to a foreign key field. When the field is in a single model form, it was pretty clear to me how to proceed. However, when the field in question is part of a ModelFormset, I wasn’t seeing the way. Oddly enough, it came to me right before bed one night – iterate through the forms in the formset to apply the queryset to each one. Here’s how I handled each case.

In this application, there is a one-to-many relationship between Event and Rsvp, and a one-to-many between Rsvp and Seat

(snippet from views.py)

    ## build the querysets
    event_queryset = Event.futures.filter(institution=inst_id)
    person_queryset = Person.objects.filter(institution=inst_id)    

    ## create the formset
    SeatFormSet    = inlineformset_factory(Rsvp, Seat, form=SeatForm, can_delete=True, extra=3)

    ...
   
    ## instantiate the form and formset
    rsvpform      = RsvpForm(instance=rsvp)
    seatformset    = SeatFormSet(instance=rsvp)

    ## Apply the queryset to the master form
    rsvpform.fields['event'].queryset = event_queryset

    ## Apply the queryset to each form in the formset
    for form in seatformset.forms:
        form.fields['person'].queryset = person_queryset

It’s important to note that the querysets must be applied each time the form and formset are instantiated in your view code in order for the validation to work correctly.

In the for statement, I first tried iterating on the formset itself, but I was told that it was not iterable. A little perusal of the Django code for formsets showed me that the forms variable in the model was the object to use instead.

I’d love to hear your comments on this technique.

Phone Numbers From Around the World

What do you do with phone numbers?

I’ve used some different tricks to validate phone numbers in the North American format (999-999-9999), but I’ve learned that we also have invitees from Europe and other locales.  The problem I have, and I think it is a common problem, is that I need to be able to accept all possible phone number formats while attempting to validate the format used.  The goal of course is to minimize entry mistakes by the users.

I would love to hear your thoughts in the comments.

Using the Python iCalendar Library

An application I’m working on uses the icalendar library (http://codespeak.net/icalendar/) to generate an ICS file of upcoming events that can be linked to another calendar program (I’ve tried Outlook and Google Calendar). One problem I was having was that I was unable to generate time zone aware times. Mine were all naive – also called ‘floating’ times – that would appear the same regardless of the calendar program’s timezone.

What I was trying to do was to create a new time zone in the ICS file, and then append the TZID parameter to each date/time value. It was not happening.

So, I dug into the source code for icalendar. This library is well documented with tests, and I soon found a solution – include a tzinfo value when supplying the date/time and icalendar will convert it to UTC and append a ‘Z’ to the end of the time to indicate this. icalendar supplies a class called LocalTimezone that can be used for this purpose. I tried it, and it worked! Here’s a simplified version of my code:

    from icalendar import Calendar, Event
    from datetime import datetime 
    from icalendar import LocalTimezone   
    
    cal = Calendar()
    
    cal.add('version', '2.0')
    cal.add('prodid', '-//test file//example.com//')
    cal.add('X-WR-CALNAME','Test Calendar ' )
    
    lt = LocalTimezone() # we append the local timezone to each time so that icalendar will convert
                         # to UTC in the output
    
    for ent in queryset:        
        event = Event()
        event.add('summary', ent.event_name)
        event.add('dtstart', datetime.combine(ent.event_date,ent.start_time).replace(tzinfo=lt))
        event.add('dtend', datetime.combine(ent.stop_date,ent.stop_time).replace(tzinfo=lt))
        event.add('dtstamp', ent.updated_on.replace(tzinfo=lt))
        event['uid'] = ent.pk  # should probably use a better guid than just the PK
        event.add('priority', 5)
        
        cal.add_component(event)
    
    return cal

And this is a sample event from the output:

BEGIN:VEVENT
DTEND: 20100714T184500Z
DTSTAMP:20100714T185936Z
DTSTART:20100719T174500Z
PRIORITY:5
SUMMARY:Gateway Production
UID:66
END:VEVENT

Note that my 1:45pm event in the US Eastern time zone (EDT) shows as 1745 in UTC.

Custom Widget – My First Experience

I hadn’t needed to write a custom widget before today, but I needed to have disabled options included in a <select> list. After a little research and a LOT of trial and error, this is what I came up with.

from django.forms.widgets import Select, force_unicode, flatatt, escape, mark_safe, chain, conditional_escape

class SelectDisabled(Select):
    def __init__(self, attrs=None, choices=(), disabled=[]):
        super(SelectDisabled, self).__init__(attrs, choices)
        self.disabled = list(disabled)

    def render(self, name, value, attrs=None, choices=()):
        if value is None: value = ''
        final_attrs = self.build_attrs(attrs, name=name)
        output = [u'<select%s>' % flatatt(final_attrs)]
        options = self.render_options(choices, [value], self.disabled)
        if options:
            output.append(options)
        output.append('</select>')
        return mark_safe(u'\n'.join(output))

    def render_options(self, choices, selected_choices, disabled_choices):
        def render_option(option_value, option_label):
            option_value = force_unicode(option_value)
            option_label = (option_value in disabled_choices) and (force_unicode(option_label) + ' - SOLD OUT') or force_unicode(option_label)
            selected_html = (option_value in selected_choices) and u' selected="selected"' or ''
            disabled_html = (option_value in disabled_choices) and u' disabled="disabled"' or ''
            return u'<option value="%s"%s%s>%s</option>' % (
                escape(option_value), selected_html, disabled_html,
                conditional_escape(option_label))
        # Normalize to strings.
        selected_choices = set([force_unicode(v) for v in selected_choices])
        disabled_choices = set([force_unicode(v) for v in disabled_choices])
        output = []
        for option_value, option_label in chain(self.choices, choices):
            if isinstance(option_label, (list, tuple)):
                output.append(u'<optgroup label="%s">' % escape(force_unicode(option_value)))
                for option in option_label:
                    output.append(render_option(*option))
                output.append(u'</optgroup>')
            else:
                output.append(render_option(option_value, option_label))
        return u'\n'.join(output)

And here’s how I used it in the form:

class B2CRegistrationStartForm(Form): 
    def __init__(self, disabled, *args, **kwargs):
        super(B2CRegistrationStartForm, self).__init__(*args, **kwargs)
        if disabled:
            self.fields['event'].widget.disabled = disabled
    event = ModelChoiceField(widget=SelectDisabled(),queryset=Event.b2c_web_reg.all(),empty_label=None)  

And here is the form creation in views.py:

    disabled = []
    for e in Event.b2c_sold_out.values_list('id'):
        disabled.append(e[0])
    form = B2CRegistrationStartForm( data=request.POST or None, disabled=disabled)

(The list of disabled IDs needs to be in a list, not a list of tuples)

It works fine, and I learned about custom widgets, but I don’t feel like I was very Object Oriented in my approach. The __init__() method uses super() well, but the other two basically replace the methods in the parent class. When I update to a newer version of Django (from the current 1.0.4), I’ll have to check this carefully and probably redo it.

Anyone have suggestions on how I could better implement this feature?

Simple Search in Django

I started programming (in BASIC, Pascal, and COBOL) before the Web was available, and it’s amazing to see how much the craft has changed with instant access to thousands of references. However, sometimes I throw up a search and get back the perfect solution, and I’m frustrated that I couldn’t figure out something so simple. It’s just too easy to search instead of think.

My example today is an easy search in Django. I have read about Haystack, Solr, and other search engines that require a daemon, special database structures, and other special things, but these are overkill for a search on a relatively small dataset.

My search returned this post from Toast Driven that outlined a way to append together a few Q objects to return a search result that uses multiple terms and multiple fields. Genius!

Thanks to Daniel at Toast Driven.