Using “in” in QuerySets in Django
So the purpose of this post is just to look at a neat way of solving a problem in Django, and explain its mechanics and how it was put together.
In today’s example, the goal was pretty straightforward. We have a model class, Topic, that has a one-to-many relationship to a ThreadedComment model class. Each ThreadedComment for each Topic, has a User associated with it; furthermore the Topic itself as a User associated with it (the creator of the topic).
The issue here is that it’s not actually a straightforward relationship between Topic and ThreadedComment, due to some complicating factors beyond the scope of this discussion. Suffice it to say what we’re looking for is a list of users who are participants in the discussion, inclusive of the topic creator. But we’re not looking for just any old list — what we need is a QuerySet, for the reason that we are making use of some specialized tags in our templates that require data to be in a QuerySet.
Here’s a quick example of what we want the interface to look like in the template:
<ul>
{% for participant in topic.participants %}
<li>{{ participant.name }}</li>
{% endfor %}
</ul>
So let’s do this step by step.
First, let’s get a list of the ThreadedComment objects we need:
ThreadedComment.public.all_for_object(topic))
The all_for_object() method is one of the “complications” we were referring to above. All we care about is that it returns a set of ThreadedComment objects for a given topic.
Okay, but now that we have this set of objects, how are we going to return a QuerySet that yields us User objects?
My co-worker actually showed this to me, so I’ll share it here. We’ll use in keyword argument in filter(), which maps to the SQL IN clause. The Django documentation has a simple example:
Entry.objects.filter(id__in=[1, 3, 4])
is equivalent to:
SELECT ... WHERE id IN (1, 3, 4)
So that’s just what we’re going to do: pass a list of user ids as an in keyword argument to identify the users we want.
First, let’s extend our original QuerySet to get the values we need as a list:
ThreadedComment.public.all_for_object(self).values_list('user_id', flat=True)
One thing is missing — we need to include the user id of the topic creator, as I mentioned above. We’ll just evaluate the QuerySet to a list and then add the topic creator to it:
list(ThreadedComment.public.all_for_object(self).values_list('user_id', flat=True)) + [self.creator.id]
Now let’s take the list of ids and pass it to a QuerySet that we’ll return User objects. Our list goes in the pk__in keyword argument:
User.objects.filter(
pk__in=list(
list(ThreadedComment.public.all_for_object(self).values_list('user_id', flat=True)) + [self.creator.id]
)
)
Now let’s add this as a method participants() to the Topic model and throw a @property decorator on it so we can call it as an attribute:
@property
def participants(self):
return User.objects.filter(
pk__in=list(
list(ThreadedComment.public.all_for_object(self).values_list('user_id', flat=True)) + [self.creator.id]
)
)
And that’s all there is to it.
What I love about this approach is that it’s a pattern that fits in to so many places (so many that I almost feel like it deserves a higher level abstraction within the QuerySet API). There are variations that make it slightly more efficient as well; for example, if I didn’t need to append the topic creator id in my example, then I could have passed the list of ids as a query as opposed to evaluating it:
User.objects.filter(
pk__in=ThreadedComment.public.all_for_object(self).values_list('user_id', flat=True)).query
)
The advantage here is that this is evaluated as a subselect statement, so it only requires a single database call.
Anyhow, this technique is really fantastic when you have a set of related objects that for whatever reason you aren’t able to easily obtain as a QuerySet. Enjoy.

1 comment
any idea what to do when you have more than 1000 user ids, and the sqlite backend throws “too many SQL variables” ??? I’m stuck….