On the comp.sys.mac.programmer.help newsgroup, Florian Zschocke asked about improving the performance of his Core Data application. Here’s an adapted version of my reply to his post.
Core Data applications should scale quite well to large data sets when using an SQLite persistent store. That said, there are a couple implementation tactics that are critical to performance for pretty much any application using a technology like Core Data:
- Maintain a well-normalized data model.
- Don’t fetch or keep around more data than you need to.
Implementing these tactics will make it much easier to both create well-performing Core Data applications in the first plce, and to optimize the performance of applications already in progress.
Maintaining a normalized data model is critical for not fetching more data than you need from a persistent store, because for data consistency Core Data will fetch all of the attributes of an instance at once. For example, consider a Person entity that can have a binary data attribute containing a picture. Even if you’re just displaying a table of Person instances by name, Core Data will still fetch the picture because it’s an attribute of Person. Thus for performance in a situation like this, you’d normalize your data so that you have a separate entity, Picture, to represent the picture for a Person on the other side of a relationship. That way the image data will only be retrieved from the persistent store if the relationship is actually traversed; until it’s traversed, it will just be represented by a fault.
Similarly, if you have lots of to-many relationships and need to display summary information about them, de-normalizing your data model slightly and caching the summary information in the main entity can help.
For example, say your app works with Authors and Books.
Author.books is a to-many relationship to Book instances and
Book.authors is a to-many relationship to Author instances. You may want to show a table of Authors that includes the number of Books related to the Author. However, binding to
books.@count for that column value will cause the relationship fault to fire for every Author displayed, which can generate a lot more traffic to the persistent store than you want.
One strategy would be to de-normalize your data model slightly so Author also contains a
booksCount attribute, and maintains that whenever the
Author.books relationship is maintained. This way you can avoid firing the
Author.books relationship fault just because you want to display the number of Books an Author is related to, by binding the column value to
booksCount instead of
Another thing be careful of is entity inheritance. It’s an implementation detail, but inheritance in Core Data is single-table. Thus if you have every entity in your application inheriting from one abstract entity, it’ll all wind up in a single table, potentially increasing the amount of time fetches take etc. because they require scanning more data.
Retaining or copying the arrays containing fetch results will keep those results (and their associated row cache entries) in memory for as long as you retain the arrays or copies of them, because the arrays and any copies will be retaining the result objects from the fetch. And as long as the result objects are in memory, they’ll also be registered with a managed object context.
If you want to prune your in-memory object graph, you can use
-[NSManagedObjectContext refreshObject:mergeChanges:] to effectively turn an object back into a fault, which can also prune its relationship faults. A more extreme measure would be to use
-[NSManagedObjectContext reset] to return a context to a clean state with no changes or registered objects. Finally, you can of course just ensure that any managed objects that don’t have changes are properly released, following normal Cocoa memory management rules: So long as your managed object context isn’t set to retain registered objects, and you aren’t retaining objects that you’ve fetched, they’ll be released normally like any other autoreleased objects.