Sunday, 23 October 2016

Object copying speed comparison

       I've been wondering about speed differences in execution of property copying code, which is usually used/found in the adapter pattern implementation. The answers are obviously in stackoverflow and in various blogs. But I thought about pushing it a bit further and comparing speeds of different approaches to this issue.

Bear in mind different ways of doing the same thing have advantages and disadvantages and each way has different caveats. I've selected a couple of approaches, there are many which I've either missed or disregarded. Some of the listed methods have minimal differences between them - or would never be used in production code.

As a common use case for all I've selected copying data between a data transfer object (DTO) object and a View Model object.
  1. ManualMap - an ordinary property by property copying of whatever is in the object. Most often this is how it's done manually. Uses a foreach loop to loop through the objects.
  2. ManualForArrayMap - takes in an array of Dtos and uses a for loop. It could use a list as well.
  3. LinqMap - same as the foreach one but uses Linq to loop through the items.
  4. AutoMapperMap - uses automapper and foreach loop to loop through all the items
  5. AutoMapperLinqMap - similarly as above but uses Linq
  6. AutoMapperCollectionMap - maps the whole collection in one go instead of looping through the items and mapping one by one.
  7. ILMap - uses Emitted code which is cached as a delegate. Generic.
  8. ExpressionMap - uses expressions to bind properties of the objects. Generic.
  9. ReflectionOrderedPropertiesCopy - uses reflection with the assumption that the properties are ordered. Generic.
  10. ReflectionPropertySearchCopy - uses reflection and searches for each property within the object by name. Generic.
  11. ParallelForEachManualMap - uses a partitioner and a parallel foreach loop to split the collection into separate ranges.
  12. ParallelLinqMap - parallel linq with the use of AsParallel method. Checked a couple of times with different degrees of parallelism set.
The slowest out of these were the ones which use reflection (9 and 10), I haven't added them in the comparison cause numbers were off the chart. 7 and 8 use reflection as well but only when building the lookup - the time required for it is not taken into consideration, similarly as AutoMapper configuration.

To check the timings I've used DotNetBenchmark library, I've run it in release outside of Visual Studio to have fairly correct results. All the results below are actually a median taken out of multiple runs of the same method.

Below you can see the comparison of timings for different numbers of DTOs fed into the methods, all of these timings are in nanoseconds.

Comparison of timings.
Next I've created a chart based on the above.


There are some obvious takeaways above. I'll go into more details in the next post.

I'm pretty sure I have missed something or introduced some errors in the code - if you notice anything feel free to let me know. The code itself is sometimes under optimized and sometimes over optimized - but it was not the point of this specific post - it was more getting the general feeling of how fast can this boilerplate code run.

Links:
Github with sources for the comparison project: https://github.com/simonkatanski/speedtest/

4 comments:

  1. It's hard to compare Parallel to other methods since it's very dependent on the machine you're running (e.g. CPU core count) and DTO size.
    Parallel is something you might want to try to benchmark for your specific case in the specific environment, but it's hard to say "it's faster/slower than this other method" in general.
    But in general I believe Parallel is not really a use case for object copying: in most cases you'll get more context switching than real benefit.

    ReplyDelete
    Replies
    1. It's there just to see, and I bet what you're saying is the cause for the poor performance. The data is not split into more meaningful chunks just dynamically assigned to separate threads based on the degree of parallelism set.

      The mechanism has obviously a different use case - longer running code. As I've mentioned I'll jump into the nitty-gritty details in the next post :)

      Thanks for the input.
      Szymon

      Delete
  2. I also took a peek at the code. For predictable and comparable results would be more fair to pass and return a specific type (and always materialised collection) and not IEnumerable, and don't do ToList() in your Benchmarks.
    This could also make a difference in performance: I guess doing foreach on an array might be faster than doing it on a list.
    See also this: https://www.infoq.com/articles/For-Each-Performance

    ReplyDelete
    Replies
    1. Again thanks for the input. I suppose for a fair comparison it would be best to just separate the actual copying code and benchmark it outside of the loops. It would be far more boring then though.

      That post was a really interesting read, I might extend it with some extra cases where I'd pass different interfaces/concrete types as input collection.

      Szymon

      Delete