<%@ page title="" language="C#" masterpagefile="~/MasterPage.master" autoeventwireup="true" inherits="issue1_cpp_vs_csharp_arrays, App_Web_w3f1bb6p" %> The Treatyist - C# vs C++, Arrays

C# vs C++, Arrays

Comparison Research

It seems to be a settled case that C++ is fast, and .NET is slow. Yet, there is really not so much analysis out there to actually support that. With increasing time demands and ever competitive demands for more features, more quickly, I felt it useful to know just how much I would be giving up in performance if I just ditched C++ and started writing everying in C#. The results thus far may surprise some.

A good place to start in such an investigation is arrays. Arrays are one of the most fundamental structures of computing, so fundamental that processors assume that they are array crunching engines and optimize for that.

Making a program "cache friendly" means organizing data as densely as possible, and in order, and that means a array. For x86, the old ESI/RSI, EDI/RDI registers came to mean source and destination index into arrays. And, the SSE instructions are a great tool for dealing with arrays. Finally, strings, of text, are usually implemented as a specialization of an array.

Note that in all of the examples of disassembly, full optimization was on, and on both sides.

The C++ Native Array

C++ has a well deserved reputation for being a speedy array processor, and it should. In that language, an array is a pointer. The much maligned pointer is just an indirect memory reference in assembly language. Let's have a look at a simple loop in C++:

void populateWithPointerLoop(int *arr, int length) { int *endarr = arr + length; int i = 0; while (arr < endarr) { *arr++ = i * i; i++; } } In our particular test program, the body of the loop assembles into this: 000000013F9310B1 mov eax,ecx 000000013F9310B3 imul eax,ecx 000000013F9310B6 mov dword ptr [rbx],eax 000000013F9310B8 add rbx,4 000000013F9310BC inc ecx 000000013F9310BE cmp rbx,rdx 000000013F9310C1 jb main+0B1h (13F9310B1h)

For this roundup, that's about as good as the compilers do. This is pretty quick but it isn't perfect. The point is, though, that vanilla pointer and array loops in C and C++ are going to lead to reasonably efficient machine code. Some of the touches of the compiler are interesting. Our test calling sequence just calls each function in turn. The compiler not only was smart enough to inline all the function calls, it also converted a parameter to a function into an immediate mode cmp instruction. That's pretty nice.

C# Pointer Array

C# has a reputation for being somewhat sluggish. For array crunching, this is entirely deserved. C# is rather wordy for array processing. The fastest code that I could come up with, casually, is to use unsafe code and a C# pointer. Yes, C# does have pointers, but they are more like a married man at a friend's bachelor party then they are a single man at a bar. They have so many restrictions you almost wonder what the point is. The source looks like this:

unsafe static void populateWithPointerForLoop(int[] arr) { fixed (int* c = &arr[0]) { int l = arr.Length; for (int i = 0; i < l; i++) { c[i] = i * i; } } }

What appears to be the loop of the generated product from above looks like:

0000008c movsxd rdx,dword ptr [rsp+2Ch] 00000091 mov ecx,dword ptr [rsp+2Ch] 00000095 imul ecx,dword ptr [rsp+2Ch] 0000009a mov rax,qword ptr [rsp+20h] 0000009f mov dword ptr [rax+rdx*4],ecx 000000a2 mov eax,dword ptr [rsp+2Ch] 000000a6 add eax,1 000000a9 mov dword ptr [rsp+2Ch],eax 000000ad mov eax,dword ptr [rsp+28h] 000000b1 cmp dword ptr [rsp+2Ch],eax 000000b5 jl 000000000000008C

It's more involved, for sure. There's more instructions. As a rule for this sort of thing, that means slower code. Its worth noting that the setup was far more complicated than the C / C++ comparisons. Of course, that is not the way one normally uses arrays in C#.

C# Basic Arrays

Our simple loop, with stock arrays, in C#, would look like this:

static void populateWithForLoopInvariant(int[] arr) { int l = arr.Length; for (int i = 0; i < l; i++) { arr[i] = i * i; i++; } }

Given a piece of code like that, the compiler dutifully generates this:

00000045 mov eax,dword ptr [rsp+24h] 00000049 imul eax,dword ptr [rsp+24h] 0000004e mov dword ptr [rsp+28h],eax 00000052 movsxd rcx,dword ptr [rsp+24h] 00000057 mov rax,qword ptr [rsp+50h] 0000005c mov rax,qword ptr [rax+8] 00000060 mov qword ptr [rsp+30h],rcx 00000065 cmp qword ptr [rsp+30h],rax 0000006a jae 0000000000000078 0000006c mov rax,qword ptr [rsp+30h] 00000071 mov qword ptr [rsp+30h],rax 00000076 jmp 000000000000007D 00000078 call FFFFFFFFF96583F0 0000007d mov rdx,qword ptr [rsp+50h] 00000082 mov rcx,qword ptr [rsp+30h] 00000087 mov eax,dword ptr [rsp+28h] 0000008b mov dword ptr [rdx+rcx*4+10h],eax 0000008f mov eax,dword ptr [rsp+24h] 00000093 add eax,1 00000096 mov dword ptr [rsp+24h],eax

Clearly there's more involved in this approach. This is seemingly not as efficient as the C or C++ code I showed earlier, but, is it -that- much less efficient? The function call in the middle of the loop is troubling, but it appears like it is calling something that throws an exception if the array bounds is exceeeded. It's certainly wordier than any of the C++ implementations so far, but then again, the C++ example doesn't check the array bounds. STL does though. Let's have a look at that.

C++ STL vector

Pure vectors are the fastest way to go, but many C++ developers choose to use the safer STL for everything now, rather than the basic arrays. STL is C++'s standard library efficiency is but one of STL's goals. Let's look at our C++ simple STL program:

void populateWithStdVector( std::vector& v, int length ) { for (int i = 0; i < length; i++) { v[i] = i * i; } } and then its dissassembly 000000013FB810D0 mov rax,qword ptr [rsp+48h] 000000013FB810D5 mov rcx,qword ptr [rsp+40h] 000000013FB810DA sub rax,rcx 000000013FB810DD sar rax,2 000000013FB810E1 cmp rdi,rax 000000013FB810E4 jb main+0F1h (13FB810F1h) 000000013FB810E6 call qword ptr [__imp__invalid_parameter_noinfo (13FB82138h)] 000000013FB810EC mov rcx,qword ptr [rsp+40h] 000000013FB810F1 mov eax,ebx 000000013FB810F3 imul eax,ebx 000000013FB810F6 mov dword ptr [rcx+rdi*4],eax 000000013FB810F9 inc ebx 000000013FB810FB inc rdi 000000013FB810FE cmp ebx,64h 000000013FB81101 jl main+0D0h (13FB810D0h)

STL clearly bloats up C++ a bit. It makes us wonder. If you are writing in C++ but are willing to trade performance to get better libraries, why not just use a language like C#? More than a few C++ STL fans would get better results with C# collections.

Benchmark: C# array vs C++ STL vector vs C++ native array

Benchmarks Defined

I created three benchmarks.

  1. array set - array set is a timed derivative of the basic array assignment code we have been discussing.
  2. big streamish - squares the source array into a dest array. The array sizes are large enough to make memory bandwidth an issue.
  3. little streamish - like big streamish, but with more trials and smaller arrays in an attempt to reduce the effects of memory bandwidth on the timing.

Results


Discussion

The results here are interesting. Memory bandwidth outweighed code generation in the big streamish benchmark. All three systems performed nearly identically, with STL trailing the pack. With that obstacle removed, a native array approach performed best overall across all three benchmarks. Surprisingly, C#'s array implementation did not trail too far behind C++'s native array

Like so many other writers, I admit that my benchmarking doesn't tell the full story of the capabilities of any product. In particular, I will probably revise this article to include comparison's with GCC as I suspect GCC at this point probably produces better 64 bit code for simple array crunching than does Visual C++.

Conclusions

I would say that if you are embarking on the creation of a new scientific or financial computing product, using Microsoft tools on Windows, and your gut is telling you to use STL on Visual C++ because you want the convenience and the safety, you should probably have a look at C# instead. C# is safer and faster than STL on Visual C++. The only way that you can get the kind of performance that Visual C++ is famous for is to live dangerously, and take a C approach to arrays, not a C++ one. Even then, you will be surprised at how often C#'s array implementation comes close to C++'s native arrays for performance.

One thing to notice about C# is that I had arranged the timing checks to not include the time to compile the code. It looks like we can make any C# benchmark worse by moving the timing loop outside of a code block. C# will compiles on demand but in our case even that took but a few milliseconds.

Be careful about using these benchmarks to draw conclusions about other compilers on other platforms. Do not say that you will use Mono over GNU C++ on Linux because of this page, but you should have a look at the STL vector source code you are using, and understand what you are getting into. Know your C++ compiler and STL implementation, and do a sanity check against a language that is easier to program. If you can get the performance you need in an application language rather than a systems one, certainly take the convenience of the application language and get your solution deployed more quickly.

System

These benchmarks were written and executed on the release candidate for Windows 7, 64 bit, using Visual Studio 2008 SP1, and .net framework 3.5 SP1. The hardware was 2P AMD Opteron 270 processor box, with 2GB of RAM, on a Tyan S2885 motherboard. The wstream benchmark for copy, stream, and triad for this machine are about 1440mb/s. It's a dinosaur folks, and I look forward to replicating these results on a Nehalem architecture machine, once I can obtain one.