A good data model for finding a user's favorite stories

Original Design

Here's how I originally had my Models set up:

class UserData(db.Model):
    user = db.UserProperty()
    favorites = db.ListProperty(db.Key) # list of story keys
    # ...

class Story(db.Model):
    title = db.StringProperty()
    # ...

On every page that displayed a story I would query UserData for the current user:

user_data = UserData.all().filter('user =' users.get_current_user()).get()
story_is_favorited = (story in user_data.favorites)

New Design

After watching this talk: Google I/O 2009 - Scalable, Complex Apps on App Engine, I wondered if I could set things up more efficiently.

class FavoriteIndex(db.Model):
    favorited_by = db.StringListProperty()

The Story Model is the same, but I got rid of the UserData Model. Each instance of the new FavoriteIndex Model has a Story instance as a parent. And each FavoriteIndex stores a list of user id's in it's favorited_by property.

If I want to find all of the stories that have been favorited by a certain user:

index_keys = FavoriteIndex.all(keys_only=True).filter('favorited_by =', users.get_current_user().user_id())
story_keys = [k.parent() for k in index_keys]
stories = db.get(story_keys)

This approach avoids the serialization/deserialization that's otherwise associated with the ListProperty.

Efficiency vs Simplicity

I'm not sure how efficient the new design is, especially after a user decides to favorite 300 stories, but here's why I like it:

  1. A favorited story is associated with a user, not with her user data

  2. On a page where I display a story, it's pretty easy to ask the story if it's been favorited (without calling up a separate entity filled with user data).

    fav_index = FavoriteIndex.all().ancestor(story).get()
    fav_of_current_user = users.get_current_user().user_id() in fav_index.favorited_by
  3. It's also easy to get a list of all the users who have favorited a story (using the method in #2)

Is there an easier way?

Please help. How is this kind of thing normally done?

13.10.2009 18:36:46
Maybe there's something I don't understand here but according to the documentation on queries (code.google.com/appengine/docs/python/datastore/…) the use of lists is pretty limited. So oif you want to find favorites of a certain user the query you write that filters by "favorited_by" is pretty much limited to 30 users... Am I missing something here?
Eran Kampf 24.10.2009 06:39:57

What you've described is a good solution. You can optimise it further, however: For each favorite, create a 'UserFavorite' entity as a child entity of the relevant Story entry (or equivalently, as a child entity of a UserInfo entry), with the key name set to the user's unique ID. This way, you can determine if a user has favorited a story with a simple get:

UserFavorite.get_by_name(user_id, parent=a_story)

get operations are 3 to 5 times faster than queries, so this is a substantial improvement.

13.10.2009 20:04:12
Thank you very much, this is exactly what I was looking for. Btw, it should be get_by_key_name
wings 13.10.2009 20:25:11
2 quick questions: (1) i thought a key's name had to be unique, but i imagine your example works because it's only the key itself that has to be unique? (2) is my 'New Design' (scanning through the list of users in all favorite indexes to find a single user) really more efficient than just grabbing a set of keys, like i did in the 'Original Design'?
wings 13.10.2009 20:33:08
um, also: should the UserFavorite model have any properties?
wings 13.10.2009 20:41:38
1) Yes, it's the key that has to be unique, not just the key name, 2) I'm not sure I understand your question - I don't see any 'scanning' going on, just queries. 3) UserFavorite doesn't need any properties unless you want to associate some data (such as creation timestamp) with the favorite.
Nick Johnson 14.10.2009 10:15:16
1) It says in the docs that a key name has to be unique, but I was just saying that it doesn't have to be unique if it has a different parent from other entities of the same kind. 2) I just want to know if the new approach I described would scale with thousands of users and stories. It seems like the datastore must be doing a lot more work to find a user's favorite stories with the new approach as opposed to the original way I had it set up. Thank you very much for all of your answers :)
wings 14.10.2009 19:01:34

I don't want to tackle your actual question, but here's a very small tip: you can replace this code:

if story in user_data.favorites:
    story_is_favorited = True
    story_is_favorited = False

with this single line:

story_is_favorited = (story in user_data.favorites)

You don't even need to put the parentheses around the story in user_data.favorites if you don't want to; I just think that's more readable.

13.10.2009 19:20:41

You can make the favorite index like a join on the two models

class FavoriteIndex(db.Model):
    user = db.UserProperty()
    story = db.ReferenceProperty()


class FavoriteIndex(db.Model):
    user = db.UserProperty()
    story = db.StringListProperty()

Then your query on by user returns one FavoriteIndex object for each story the user has favorited

You can also query by story to see how many users have Favorited it.

You don't want to be scanning through anything unless you know it is limited to a small size

13.10.2009 20:51:24

With your new Design you can lookup if a user has favorited a certain story with a query.
You don't need the UserFavorite class entities.
It is a keys_only query so not as fast as a get(key) but faster then a normal query.
The FavoriteIndex classes all have the same key_name='favs'.
You can filter based on __key__.

a_story = ......
a_user_id  = users.get_current_user().user_id()
favIndexKey = db.Key.from_path('Story', a_story.key.id_or_name(), 'FavoriteIndex', 'favs')
doesFavStory = FavoriteIndex.all(keys_only=True).filter('__key__ =', favIndexKey).filter('favorited_by =', a_user_id).get()

If you use multiple FavoriteIndex as childs of a Story you can use the ancestor filter

doesFavStory = FavoriteIndex.all(keys_only=True).ancestor(a_story).filter('favorited_by =', a_user_id).get()
19.10.2009 15:27:48