Google App Engine Django CSV upload, part 2

[In the previous part I talked about transforming CSV into GAE models. Now it’s time to upload that to your GAE application. Let’s see how that can be done.

Basic CSV input view & template

Let’s make the following view in csvup/views.py:

from django.http import HttpResponse, Http404

def default(request):
  return HttpResponse("it works")

Make an url pointers in csvup/urls.py as necessary to get to the above view:

from django.conf.urls.defaults import *

urlpatterns = patterns('csvup.views',
  (r'^$', 'default'),
)

Fire up your development server with python manage.py runserver and go to http://localhost:8000/. You should see our “it works” text.

Instead of this, let’s consider a text box where you could easily copy and paste (or even type for small number of entries) some CSV data to be uploaded. To do this, we’ll use some templating. Put this into csvup/templates/csvup/default.htm template:

<html>
<head>
<title>GAE custom CSV upload</title>
</head>
<body>
<h1>GAE custom CSV upload</h1>
</body>
</html>

and change the view to:

from django.shortcuts import render_to_response

def default(request):
  return render_to_response('csvup/default.htm')

Refresh your browser – you should see our new template rendered.

Django provides Forms for easy HTML form manipulation. We’ll use this to get the CSV the user enters. Here’s the form – put it in csvup/models.py:

from django.form import Form, CharField, Textarea

class CsvupForm(SlugForm):
  csv = CharField(required=True, widget=Textarea(attrs=dict(cols=80,rows=15)))

We need to add the following to the body of our template, too:

<!-- ... -->
<body>
<h1>GAE custom CSV upload</h1>
<form method="post" action="{% url csvup.views.default %}">
  {{ form.as_p }}
  <input type="submit"/>
</form>
</body>
<!-- ... -->

Change your view to include a newly used form:

from django.shortcuts import render_to_response
from csvup.models import Person, CsvupForm

def default(request):
  return render_to_response('csvup/default.htm', dict(form=CsvupForm())) 

Refresh your browser, you should see this:

This would allow us to enter copy & paste the CSV into the box, then import it by submitting the form.

Importing data

Csvup class that we made is model-agnostic. We will make our view in the same way. Let’s change the URL mappings like this:

urlpatterns = patterns('csvup.views',
  (r'^(?P<mklass>[^/]+)', 'default'),
)

and our view like this:

def default(request, mklass):
  if request.method == 'POST':
    rowsCnt = 0
    return render_to_response('csvup/uploaded.htm', dict(rowsCnt=rowsCnt, mklass=mklass)) 
  return render_to_response('csvup/default.htm', dict(form=CsvupForm(), mklass=mklass)) 

Add the following to csvup/templates/csvup/uploaded.htm:

<html>
<head>
<title>GAE custom CSV upload - finished</title>
</head>
<body>
You've just uploaded to the following model klass: {{ mklass }}<br/>
Go <a href="{% url csvup.views.default mklass %}">back</a> to upload some more.
</body>
</html>

and change the default.htm template like this to satisfy the url pattern change we made:

<!-- ... -->
<form method="post" action="{% url csvup.views.default mklass %}">
<!-- ... -->

This allows us to go to http://localhost:8000/csvup/csvup.models.Person and it will allow us to upload CSV to the Person model. Go try it out to confirm it’s working. You, of course, won’t get any uploads yet, but the base is there. Except one more step…

Python Class.forName

A bit Java-centric, won’t you say? Basically, the above link would lead us to getting mklass = “csvup.models.Person”. What’s wrong with that? It’s a string – we need a Python class. With a bit help from hasen j on Stack Overflow, we get this:

def getClass(kls):
  parts = kls.split('.')
  module = ".".join(parts[:-1])
  m = __import__(module)
  for comp in parts[1:]: 
    m = getattr(m, comp)
  return m

getClass(“csvup.models.Person”) will actually give us csvup.models.Person class. We are all set to glue up the pieces.

Actual upload

Here’s a view that assembles all the above:

from google.appengine.ext import db

# ...

def default(request, mklass):
  form = CsvupForm()
  if request.method == 'POST':
    form = CsvupForm(request.POST)
    if form.is_valid(): 
      csv = form.cleaned_data['csv']
      pyMklass = getClass(mklass)
      csvup = Csvup(csv, pyMklass)
      models = csvup.makeModels()
      db.put(models)
      rowsCnt = len(models)
      return render_to_response('csvup/uploaded.htm', dict(rowsCnt=rowsCnt, mklass=mklass)) 
  return render_to_response('csvup/default.htm', dict(form=form, mklass=mklass)) 

What it says is: if it’s a POST and the form is valid, parse the CSV into models, save them, then tell the user how many rows were uploaded.

Try it out – here’s the sample from the first part:

name,birthdate,countriesVisitedCount,everMarried
Joe,1990/10/01,3,n
Maria,1970/06/05,10,y

Put it there and you should get a nice message all has been uploaded.

Listing the Persons

Now we have all uploaded and we can see absolutely nothing different about the app’s output because – there’s no output… Let’s make a quick view to show that the uploads really worked:

def list(request, mklass):
  pyMklass = getClass(mklass)
  items = pyMklass.all().fetch(1000)
  return render_to_response('csvup/list.htm', dict(mklass=mklass, items=items)) 

one more url pattern:

urlpatterns = patterns('csvup.views',
  (r'^list/(?P<mklass>[^/]+)', 'list'),
  (r'^(?P<mklass>[^/]+)', 'default'),
)

and a new template:

<html>
<head>
<title>GAE model lister</title>
</head>
<body>
<body>
<h1>All {{ mklass }} ({{ items|length }})</h1>
{% for i in items %}
  {{ i }}<br/>
{% endfor %}
</body>
</body>
</html>

Point your browser to http://localhost:8000/csvup/list/csvup.models.Person and admire your newly uploaded Persons. This is also generic, so you can use it to list any model, even in your separate application, unrelated to CSV upload.

Using the usual file upload

Instead of using text box to enter the CSV, you can use the usual file upload. That’s easy with Django. First, we make a form:

from django.form import Form, CharField, Textarea, FileField

# ...

class FileupForm(SlugForm):
  csvfile = FileField(required=True)

Then we add the following view:

from csvup.models import Person, CsvupForm, FileupForm

# ...

def fileup(request, mklass):
  form = FileupForm()
  if request.method == 'POST':
    form = FileupForm(request.POST, request.FILES)
    if form.is_valid(): 
      csv = request.FILES['csvfile'].read()
      pyMklass = getClass(mklass)
      csvup = Csvup(csv, pyMklass)
      models = csvup.makeModels()
      db.put(models)
      rowsCnt = len(models)
      return render_to_response('csvup/uploaded.htm', dict(rowsCnt=rowsCnt, mklass=mklass)) 
  return render_to_response('csvup/fileup.htm', dict(form=form, mklass=mklass)) 

and we add the url pattern for it:

from django.conf.urls.defaults import *

urlpatterns = patterns('csvup.views',
  (r'^fileup/(?P<mklass>[^/]+)', 'fileup'),
  (r'^list/(?P<mklass>[^/]+)', 'list'),
  (r'^(?P<mklass>[^/]+)', 'default'),
)

In the template, nothing much changes except the enctype of the form, which is a must:

<html>
<head>
<title>GAE custom CSV upload - fileup</title>
</head>
<body>
<h1>GAE custom CSV upload - fileup</h1>
<form method="post" action="{% url csvup.views.fileup mklass %}" enctype="multipart/form-data">
  {{ form.as_p }}
  <input type="submit"/>
</form>
</body>
</html>

As you can see, the changes are minimal. Try it out now: http://localhost:8000/csvup/fileup/csvup.models.Person.

Happy CSV uploads!

That’s about the core of CSV uploads to GAE. Some considerations:

  • Obviously, you’d think about security – you don’t want bulk upload functionality available to everybody. Typically, you restrict CSV uploads to admin users or at least logged in users,
  • Size needs to be considered. As of now, GAE allows up to 10MB files to be uploaded. Plenty for many things, but might be restrictive if your appetites are bigger,
  • Error checking was not considered in these examples,
  • It’s not pretty, so if you are a designer type go ahead and make it beautiful.

Hope this helps.


Google App Engine Django CSV upload

Let’s consider a Google App Engine project. I’ll talk about google-app-engine-django project with Django in it. At the beginning, let’s create a sample project and a sample app in it:

python manage.py startapp csvup

Add the app to your settings.py:

INSTALLED_APPS = (
  # ...
  csvup,
}

and to the root urls.py:

urlpatterns = patterns('',
  # ...
  (r'^csvup/', include('csvup.urls')),
)

Let’s create the following model in it:

from google.appengine.ext import db
from appengine_django.models import BaseModel

class Person(BaseModel):
  name = db.StringProperty()
  birthdate = db.DateProperty()
  countriesVisitedCount = db.IntegerProperty()
  everMarried = db.BooleanProperty()

Now, say we have a legacy DB with 1000 people we’d like to put online into our shiny web app for others to view. You can export it to CSV from your DB, so let’s see how to import that into your application. Note that you can use GAE-supported remote_api, as described here. If you only need a straightforward upload occasionally or for initial datastore population, it’s probably better to go with the GAE-supported way. There are some situations for which you’d like to roll your own solution instead of the remote_api one, such as:

  • You’d like to provide non-admin users with a CSV-upload option,
  • Different transformations need to be performed before uploading the file,
  • CSV upload is just one of the options you’d like to start with and add other formats as you go on (e.g. zipped CSV or YAML).

We’ll start by creating a class:

class Csvup(object):
  pass

and start adding methods to it for necessary phases.

Parsing CSV

I’m going to stick with properly formatted CSV – dealing with improper CSV is a story on its own. It’s rather easy to set your conventions for this, but I’ll assume the following:

  • There’s a header in CSV which matches exactly with property names of our model,
  • Dates are in format %Y/%m/%d and
  • Booleans are represented as y/n.

Here’s a sample of a valid CSV for the above Person model:

name,birthdate,countriesVisitedCount,everMarried
Joe,1990/10/01,3,n
Maria,1970/06/05,10,y

First step is to parse the above into Python lists, for which we’ll create the following:

class Csvup(object):
  def __init__(self, csv):
    self.csv = csv
    self.descs = []

  def parseCsv(self):
    pass

Here’s the unit test for the above (using Python’s unittest module):

import unittest
import textwrap
from csvup.views import Csvup

class CsvupTest(unittest.TestCase):
  def testCsvParsing(self):
    incsv = textwrap.dedent("""\
      name,birthdate,countriesVisitedCount,everMarried
      Joe,1990/10/01,3,n
      Maria,1970/06/05,10,y"""
    )
    csvup = Csvup(incsv)
    descs = csvup.parseCsv()
    self.assertEquals(len(descs), 2)

    desc1 = descs[0]
    vals1 = sorted([d[1] for d in desc1])
    self.assertEquals(vals1, ['1990/10/01', '3', 'Joe', 'n'])

The above uses a simple inline CSV document, parses it using our Csvup class and compares the result. An interesting thing is textwrap.dedent method, which allows you to strip the leading spaces from a Python multiline string (curtesy of bignose from here). Put the above in csvup/tests.py and run:

python manage.py test

from the root of the project. It should run all tests – the ones supplied by the frameworks and the one we just created. Instead, you can run:

python manage.py test csvup

to run just our tests. If all is good, we should have a failed test right now, as we don’t have the implementation yet.

Python enables us to parse this rather simply with it’s csv module. Here’s a sample implementation that passes the above test:

import os
import csv
from csvup.models import Person

class Csvup(object):
  # ...
  def parseCsv(self):
    lines = self.csv.split(os.linesep)
    csvr = csv.reader(lines)
    header = csvr.next()
    descs = self.descs
    for row in csvr:
      descs.append(zip(header, row))
    return descs

Each row in descs is a list of tuples. Each tuple is (property name, value). For simplicity, put this inside your views.py (or change the test above) and run the tests – the only test so far will pass.

Populating model

Now that we got our CSV Pythonized, we need to populate our models. Here’s the additional method in Csvup class and an appropriate test method:

class Csvup(object):
  def __init__(self, csv):
    # ...
    self.models = []

  # ...
  def makeModels(self):
    self.parseCsv()
    return self.models

class CsvupTest(unittest.TestCase):
  def setUp(self):
    incsv = textwrap.dedent("""\
      name,birthdate,countriesVisitedCount,everMarried
      Joe,1990/10/01,3,n
      Maria,1970/06/05,10,y"""
    )
    self.csvup = Csvup(incsv)

  def testCsvParsing(self):
    descs = self.csvup.parseCsv()
    self.assertEquals(len(descs), 2)

    desc1 = descs[0]
    vals1 = sorted([d[1] for d in desc1])
    self.assertEquals(vals1, ['1990/10/01', '3', 'Joe', 'n'])

  def testModelPopulation(self):
    models = self.csvup.makeModels()
    self.assertEquals(len(models), 2)

    model2 = models[1]
    self.assertEquals(model2.name, 'Maria')
    self.assertEquals(model2.birthdate, date(1970, 6, 5))
    self.assertEquals(model2.countriesVisitedCount, 10)
    self.assertEquals(model2.everMarried, True)

Note I refactored CsvupTest to do a set up in the setUp method, which gets called before any of the tests is executed. That way we get a fresh Csvup instance each time. Also, Csvup.makeModels is going to call parseCsv in order to guarantee that the CSV is parsed before it attempts to do any model making on its own.

What we are going to do is loop through all the property names supplied and populate the respective property on the model. Before that, the string needs to be converted to the appropriate model type, which depends on the property. To make it simple and consistent:

  • For booleans, we’ll use y = True, n = False,
  • For dates, we’ll fix the format to %Y/%m/%d. We’ll use datetime module.

For this, we will add the following test:

# ...
from datetime import date, datetime

class CsvupTest(unittest.TestCase):
  # ...
  def testConverters(self):
    cs = self.csvup.converters
    self.assertTrue(cs[int](55), 55)
    self.assertTrue(cs[float](1.23), 1.23)
    self.assertTrue(cs[bool]('y'))
    self.assertFalse(cs[bool]('n'))
    self.assertEqual(cs[date]('1990/10/01'), date(1990, 10, 1))
    self.assertEqual(cs[datetime]('1990/10/01 12:34:56'), datetime(1990, 10, 1, 12, 34, 56))

Csvup.converters is a dict that will have the necessary converters. All non-existent are treated as strings. Here’s a sample implementation:

# ...
from datetime import date, datetime

class Csvup(object):
  def __init__(self, csv):
    # ...
    self.makeConverters()
  
  def makeConverters(self):
    self.converters = {
      bool:     lambda s: s == 'y',
      int:      lambda s: int(s),
      float:    lambda s: float(s),
      date:     lambda s: datetime.strptime(s, '%Y/%m/%d').date(),
      datetime: lambda s: datetime.strptime(s, '%Y/%m/%d %H:%M:%S'),
    }

Running the tests will result in two green lights, so we are good so far. Now, what’s left is to make a model loading, i.e. pass the testModelPopulation test. Here’s a sample one:

class Csvup(object):
  # ...
  def makeModels(self):
    self.parseCsv()
    for desc in self.descs:
      self.models.append(self.makeModel(desc))
    return self.models 

  def makeModel(self, desc):
    instance = self.mklass()  
    for (name, strval) in desc:
      prop = self.mklass.properties()[name]
      converted = strval
      if prop.data_type in self.converters:
        converted = self.converters[prop.data_type](strval)
      instance.__setattr__(name, converted)
    return instance

Running the tests should give you three green lights.

Progress

So far we have:

  • Parsed a CSV,
  • Converted all string values to appropriate data types and
  • Created models for each row and stored this in a Python list.

In the next post I’ll continue with the implementation of the CSV uploader to GAE.