Segments and Slices
ArraySegment
What the hell do I need that? That was my first thought as I once again tried to implement a TCP communication by using System.Net.Sockets
in C#. Because I never used this type, I started to find out the difference between IList<ArraySegment<byte>>
and byte[]
which I used before. First of all I took a look on the MSDN page where was written:
ArraySegment<T> is a wrapper around an array that delimits a range of elements in that array. Multiple ArraySegment<T> instances can refer to the same original array and can overlap. The original array must be one-dimensional and must have zero-based indexing.
Note, however, that although the ArraySegment<T> structure can be used to divide an array into distinct segments, the segments are not completely independent of one another. The Array property returns the entire original array, not a copy of the array; therefore, changes made to the array returned by the Array property are made to the original array. If this is undesirable, you should perform operations on a copy of the array, rather than an ArraySegment<T> object that represents a portion of the array.
Furthermore I found that this struct was part of the .NET Framework since version 2.0 (And is used in sockets since .NET Framework 3.5), and since version 4.6 ArraySegment<T>
implements also the interface IReadOnlyCollection<T>
which represents a strongly-typed, read-only collection of elements.
The core benefit of this is that an ArraySegment<T>
structure is useful whenever the elements of an array will be manipulated in distinct segments. But what does this mean in real? What is so useful? Why should it be better than a list or another container datatype?
To find that out, I was just playing around a little bit to show how I can use it. For this blog-post, I provided a sample project with my tries on GitHub where I used the MSDN sample code as a Template:
private static void Usage()
{
var buffer = new byte[10];
var segment1 = new ArraySegment<byte>(buffer);
var segment2 = new ArraySegment<byte>(buffer, 5, 3);
Console.WriteLine("Origin:");
PrintOut(buffer, segment1, segment2);
Console.WriteLine();
Console.WriteLine("Update Buffer[5]:");
buffer[5] = 0x01;
PrintOut(buffer, segment1, segment2);
Console.WriteLine();
Console.WriteLine("Update Segment1.Array[5]:");
segment1.Array[5] = 0x02;
PrintOut(buffer, segment1, segment2);
Console.WriteLine();
Console.WriteLine("Update Segment2.Array[Segment2.Offset]:");
segment2.Array[segment2.Offset] = 0x03;
PrintOut(buffer, segment1, segment2);
Console.WriteLine();
}
This sample code from above produces the following output:
As we can see, it's easy to use, but where is the benefit? Getting access to a specific index, I also have to use the Array and Offset Property like segemnt.Array[segment.Offset] = 1
. No, not in real, due to the explicit interface implementation of IList<T>
you can do some cool stuff like this:
(segment2 as IList<byte>)[2] = 0x04;
var x = (segment2 as IList<byte>)[2];
Adding this code lines to our sample, we produces the output below:
This fills the gap, but I find the explicit implementation not very common in use. The reason why this is done this way is because ArraySegment<T>
implements IList<T>
and IReadOnlyList<T>
and for this they need a separation. I would prefer a second type, for example ReadOnlyArraySegment<T>
to get a better usage.
Performance
Ok, we saw the usage, now let's check the performance. In the sample you will also find a method to test that in case of memory and speed to see whether it pays off.
Test("List ", (arr, offset, elements) =>
new List<int>(arr.Skip(offset).Take(elements)));
Test("Array ", (arr, offset, elements) =>
new List<int>(arr.Skip(offset).Take(elements)));
Test("ArraySegment", (arr, offset, elements) =>
new ArraySegment<int>(arr, offset, elements));
Test("ArrayCopy ", (arr, offset, elements) =>
{
var array = new int[elements];
Array.Copy(arr, offset, array, 0, elements);
return array;
});
I think the result speaks for itself:
Average Runtime in ms
Average Total Memory in byte
Implementation
Referred to this evaluation I think it's clear why this is useful. Now, after this I took a look on the source code of this structure to see how this works. I extracted some parts to show you the basic use. As you can see in the following code, it is what it is, a wrapper around an array of T[] which supports the IList<T>
and IReadOnlyList<T>
interface explicit:
[Serializable]
public struct ArraySegment<T> : IList<T>, IReadOnlyList<T>
{
private T[] _array;
private int _offset;
private int _count;
public ArraySegment(T[] array, int offset, int count)
{
_array = array;
_offset = offset;
_count = count;
}
public T[] Array{ get { return _array;} }
public int Offset{ get { return _offset;} }
public int Count{ get { return _count;} }
...
T IList<T>.this[int index]
{
get { return _array[_offset + index]; }
set { _array[_offset + index] = value; }
}
...
}
Attention: Microsoft insert a code comment for the Count
and Offset
property as following:
Since copying value types is not atomic and callers cannot atomically read all three fields, we cannot guarantee that Count/Offset is within the bounds of Array. That's our intent, but let's not specify it as a post condition - force callers to re-verify this themselves after reading each field out of an `ArraySegment` into their stack.
To check this by yourself you can you an implementation such the class from which was used by Microsoft (RangeValidatorHelper) for the sockets.
Span
Now let us take a look into the future. In the last few section we could see what ArraySegments
can be used for. But now Microsoft (formals joeduffy) is working on a new structure named Span
, which is part of a preview nuget package called System.Slices
that is placed on the nuget channel "https://dotnet.myget.org/F/dotnet-corefxlab/api/v3/index.json" and maybe will be part of C# 7.
Description
Span is a uniform API for dealing with arrays and subarrays, strings and substrings, and unmanaged memory buffers. It adds minimal overhead to regular accesses and is a struct so that creation and subslicing do not require additional allocations. It is type- and memory-safe.
Samples
To show the usage of the new span-type, I also created some methods in the sample project:
private unsafe static void FillSpans()
{
// Over an array:
Span<int> ints = new Span<int>(new int[] { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 });
// Over a string (of chars):
Span<char> chars = new Span<char>("Hello, Slice!".ToArray());
// Over an unmanaged memory buffer:
byte* bb = stackalloc byte[256];
Span<byte>bytes = new Span<byte>(bb, 256);
PrintSpan1(ints);
PrintSpan1(chars);
PrintSpan1(bytes);
PrintSpan2(ints);
PrintSpan2(chars);
PrintSpan2(bytes);
}
private static void PrintSpan1<T>(Span<T> slice)
{
for (int i = 0; i < slice.Length; i++)
Console.Write("{0} ", slice[i]);
Console.WriteLine();
}
private static void PrintSpan2<T>(Span<T> slice)
{
foreach (T t in slice)
Console.Write("{0} ", t);
Console.WriteLine();
}
As you see, this structure is similar to the ArraySegment
but spans do not implement the two interfaces. Instead it implements IEquatable<T[]>
.
Span
provides a few constructors, which also support the initialization with a pointer, what we will discuss later in the section Unsafe To Safe. In contrast to ArraySegment
here we also can use the index operator direct, without casting. Also sub-slicing without additional allocations can be used as you can see in the following snippet:
private static void Subslicing()
{
//Extract sub span
Span<int> ints = new Span<int>(
new int[] { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 });
Span<int> subints = ints.Slice(5, 3);
//No reallocation of the string
ReadOnlySpan<char> testSpan = "Span Test".Slice();
int space = testSpan.IndexOf(' ');
ReadOnlySpan<char> firstName = testSpan.Slice(0, space);
ReadOnlySpan<char> lastName = testSpan.Slice(space + 1);
}
Unsafe To Safe
Another benefit of this type is, as we described previously the unsafe constructor. So the array can be initialized in an unsafe Method and then you it can be used anywhere else in an safe way like the following listing shows.
unsafe void Unsafe(byte* payload, int length)
{
Safe(new Span<byte>(payload, length));
}
void Safe(Span<byte> payload)
{
//now the payload could be handled in a safe way because it is wrapped
}
Language integration
By writing this blog-post I red many discussions about this new type and I created a short list of the top 6:
- https://github.com/dotnet/corefxlab/issues/816
- https://github.com/dotnet/roslyn/issues/98
- https://github.com/dotnet/roslyn/issues/120
- https://github.com/dotnet/roslyn/issues/10378
- https://github.com/dotnet/corefx/issues/7593
- https://github.com/dotnet/corefx/issues/6740
Some really cool things will be coming if this type will be part of the language. One of the proposed usage I copied out of the discussion to show you the future (maybe):
int[] primes = new int[] { 2, 3, 5, 7, 9, 11, 13 };
int item = primes[1]; // Regular array access, producing the value 3
int[:] a = primes[0:3]; // A slice with elements {2, 3, 5}
int[:] b = primes[1:2]; // A slice with elements {3}
int[:] c = primes[:5]; // A slice with elements {2, 3, 5, 7, 9}
int[:] d = primes[2:]; // A slice with elements {5, 7, 9, 11, 13}
int[:] e = primes[:]; // A slice with elements {2, 3, 5, 7, 9, 11, 13}
int[:] f = a[1:2]; // A slice with elements {3}
I think it will be very interesting to see the final version of Slices and I'm anxiously waiting on it.
Updated on 07.11.2016:
SpanOfT Review
Summary
In the reference section and in the language integration section you can find some additional links I used to learn and understand the functionality. I hope you find this article helpful. If this is the case or if you have some questions or find some bugs, please post in the comment section below. I will thankfully use your input to improve this blog-post.