Post

C# Data Structures - Dictionary and SortedList

C# Data Structures - Dictionary and SortedList
TL;DR — Key Takeaways
  • Dictionary is an O(1) hash-table structure and the most commonly used key-value collection in game development
  • SortedDictionary is an O(log n) red-black tree that keeps sorted order even during inserts/deletes
  • SortedList has slow insertion (O(n)) but is memory efficient and supports index access like an array
  • Use Dictionary for raw performance, SortedDictionary for sorted mutable data, and SortedList for memory savings + index access
Visitors

Introduction

In game development, how you store and look up data directly affects performance. If you need to query thousands of monster stats by ID, keep an inventory sorted, or display ranking data in order, the optimal data structure differs by scenario.

In C#, the three representative key-value collections are Dictionary<TKey, TValue>, SortedList<TKey, TValue>, and SortedDictionary<TKey, TValue>. They look similar on the surface, but their internal implementations are completely different, and so are their performance characteristics.

This article analyzes the internal behavior of each collection at a .NET runtime source-level perspective and summarizes which one to choose and when from a game development standpoint.


Part 1: Internal Structures

1. Dictionary - Hash Table Based

Dictionary<TKey, TValue> is the most frequently used key-value collection in C#. Internally it is implemented as a hash table, providing average O(1) lookup.

Real internal layout: buckets + entries

At the core, Dictionary uses two arrays:

1
2
3
4
5
6
7
8
9
10
11
// .NET Runtime source (simplified)
private int[] buckets;     // hash -> entries index mapping
private Entry[] entries;   // array storing actual data

private struct Entry
{
    public uint hashCode;  // hash value of key
    public int next;       // next entry index in same bucket (-1 means end)
    public TKey key;
    public TValue value;
}

The key point is that chaining is not a separate linked list. It uses the next index inside the same entries array. Since entries are in one contiguous array, cache locality is preserved.

flowchart TB
    subgraph Buckets["buckets[] (size: prime)"]
        B0["[0] -> 2"]
        B1["[1] -> -1 (empty)"]
        B2["[2] -> 0"]
        B3["[3] -> 3"]
        B4["[4] -> -1"]
        B5["[5] -> 1"]
        B6["[6] -> -1"]
    end

    subgraph Entries["entries[] (contiguous array)"]
        E0["[0] hash=9284 next=-1<br/>Key=MOB_001 Val=Goblin"]
        E1["[1] hash=5831 next=-1<br/>Key=MOB_005 Val=Slime"]
        E2["[2] hash=3500 next=4<br/>Key=MOB_003 Val=Orc"]
        E3["[3] hash=7423 next=-1<br/>Key=MOB_002 Val=Dragon"]
        E4["[4] hash=3507 next=-1<br/>Key=MOB_008 Val=Minotaur"]
    end

    B0 -->|"index 2"| E2
    B2 -->|"index 0"| E0
    B3 -->|"index 3"| E3
    B5 -->|"index 1"| E1
    E2 -->|"next=4<br/>(collision in same bucket)"| E4

Lookup process (TryGetValue)

1
2
3
4
5
6
7
1. hashCode = key.GetHashCode()
2. bucketIndex = hashCode % buckets.Length
3. entryIndex = buckets[bucketIndex]
4. compare key with entries[entryIndex].key
   - if equal -> return value
   - if not -> follow entries[entryIndex].next
   - if next == -1 -> key not found

A game analogy: think of it as a cabinet with indexed drawers. You choose a drawer by hash, then check labels inside it. Multiple entities can share one drawer (collision), but usually only one is there, so lookup is almost immediate.

Resizing: Prime number based

If collisions increase, performance can degrade toward O(n). Dictionary watches load and resizes when needed. New size is not simply 2x, but the smallest prime number >= 2x current size.

1
2
// Prime table used internally (.NET HashHelpers.cs excerpt)
// 3, 7, 11, 17, 23, 29, 37, 47, 59, 71, 89, 107, 131, 163, 197, 239, 293, 353, 431, 521, 631, ...

Why prime? Bucket index uses hashCode % bucketCount. Prime bucket sizes generally produce better distribution regardless of low-bit hash patterns. With powers of two (16, 32, 64…), low-bit bias can cause heavy collisions.

Core point: resizing allocates new buckets and entries, then rehashes all entries, which is O(n). If you know approximate size, set initial capacity to avoid expensive resizes.

Before - no initial capacity
// Resizing may happen multiple times
var monsterStats =
    new Dictionary<string, MonsterData>();

foreach (var data in allMonsterData)
{
    monsterStats[data.Id] = data;
}
After - set initial capacity
// 0 resize in ideal case
var monsterStats =
    new Dictionary<string, MonsterData>(
        allMonsterData.Count);

foreach (var data in allMonsterData)
{
    monsterStats[data.Id] = data;
}
flowchart LR
    subgraph Before["Before resize (buckets size: 7)"]
        direction TB
        BB["entries 5/7 used<br/>collision chains forming"]
    end

    subgraph Resize["Resize process"]
        R1["1. new size = HashHelpers.GetPrime(7 x 2)<br/>   -> 17 (smallest prime >= 14)"]
        R2["2. allocate new buckets[17], entries[17]"]
        R3["3. rehash all entries<br/>   (reposition via hashCode % 17)"]
    end

    subgraph After["After resize (buckets size: 17)"]
        direction TB
        BA["entries 5/17 used<br/>fewer collisions"]
    end

    Before --> Resize --> After

GetHashCode and Equals contract

For Dictionary correctness, key type must follow GetHashCode() / Equals() contract:

RuleMeaning
if a.Equals(b) == truethen a.GetHashCode() == b.GetHashCode() must hold
if a.GetHashCode() == b.GetHashCode()a.Equals(b) may still be false (collision allowed)
while key is in DictionaryGetHashCode() result must not change

Q. Why should mutable objects not be used as keys? If a key’s fields change after insertion, GetHashCode() may change. Lookup then goes to a different bucket, making the value appear lost.

Q. Is string.GetHashCode() stable across processes? No. .NET Core / .NET 5+ uses per-process hash seed for HashDoS mitigation. Do not persist hash codes to files or network. Persist the original key values. Unity Mono is often deterministic, but IL2CPP may follow random-seed policy, so be careful.


2. SortedList - Sorted Array Based

SortedList<TKey, TValue> uses two sorted arrays internally: one for keys, one for values, always maintained in key order.

1
2
3
4
// .NET Runtime source (simplified)
private TKey[] keys;      // sorted key array
private TValue[] values;  // value array aligned by index
private int _size;        // element count (keys.Length >= _size)
flowchart TB
    subgraph SortedList["SortedList internals"]
        direction TB
        subgraph Keys["keys[] (sorted)"]
            K0["[0] 100"]
            K1["[1] 200"]
            K2["[2] 300"]
            K3["[3] 500"]
            K4["[4] 800"]
        end
        subgraph Values["values[] (same index as key)"]
            V0["[0] Goblin"]
            V1["[1] Orc"]
            V2["[2] Dragon"]
            V3["[3] Slime"]
            V4["[4] Phoenix"]
        end
        subgraph Idx["Index access: O(1)"]
            I0["Keys[2] -> 300"]
            I1["Values[2] -> Dragon"]
        end
    end

Because keys are sorted, it uses binary search (Array.BinarySearch()) with O(log n) lookup.

1
2
3
4
Array: [100, 200, 300, 500, 800]

Step 1: lo=0, hi=4, mid=2 -> 300 < 500 -> lo=3
Step 2: lo=3, hi=4, mid=3 -> 500 == 500 -> found (index 3)

Even with 1,000,000 elements, it takes about 20 comparisons (log2 scale).

Insertion: cost of shifting arrays

Its weakness is insertion. To preserve sort order, mid insertion must shift all trailing elements by one using Array.Copy().

flowchart TB
    subgraph Before["Before insertion: insert key=250"]
        B["[0]100  [1]200  [2]300  [3]500  [4]800"]
    end

    subgraph Step1["Step 1: find insert index by binary search"]
        S1["~index = 2 (between 200 and 300)"]
    end

    subgraph Step2["Step 2: Array.Copy [2..4] -> [3..5]"]
        S2["[0]100  [1]200  [_]___  [3]300  [4]500  [5]800"]
    end

    subgraph After["Step 3: insert at index 2"]
        A["[0]100  [1]200  [2]250  [3]300  [4]500  [5]800"]
    end

    Before --> Step1 --> Step2 --> After

Worst-case insertion is O(n). If you insert at front of a 100k list, both keys[] and values[] shift 100k elements.

Game analogy: putting books in order on a shelf. Appending at the end is fast; inserting in the middle requires pushing many books.

Q. Does SortedList support index access? Yes. sortedList.Keys[index] and sortedList.Values[index] are O(1). This is the biggest difference from SortedDictionary.

Q. Does insertion order matter? A lot. Inserting sorted ascending data appends mostly at the end (faster). Reverse/random can cause repeated shifts and become much slower.

Q. Count vs Capacity? Count = actual elements, Capacity = internal array size. Pre-set Capacity for bulk inserts and call TrimExcess() after finishing to reclaim extra memory.


3. SortedDictionary - Red-Black Tree Based

SortedDictionary<TKey, TValue> also keeps keys sorted, but implementation is completely different: a self-balancing BST, specifically a red-black tree.

Red-black tree invariants

  1. Every node is Red or Black
  2. Root is always Black
  3. Red node must have Black children (no red-red chain)
  4. Every root-to-leaf path has the same number of Black nodes

These rules limit tree height to at most about 2 * log2(n+1).

flowchart TB
    subgraph RBTree["Red-Black Tree (root=300)"]
        N300["300<br/>⬛ Black"]
        N200["200<br/>🔴 Red"]
        N500["500<br/>⬛ Black"]
        N100["100<br/>⬛ Black"]
        N250["250<br/>⬛ Black"]
        N800["800<br/>🔴 Red"]

        N300 --> N200
        N300 --> N500
        N200 --> N100
        N200 --> N250
        N500 --> N800
    end

Why RB tree instead of AVL?

PropertyAVL TreeRed-Black Tree
Balance strictnessstrict (height diff <= 1)looser (black-height based)
Insert rotationsup to 2, but more strict rechecksup to 2, mostly color flips propagate
Delete rotationscan be O(log n) rotationsup to 3
Lookupslightly fasterslightly slower
Insert/deleteslower in practicefaster in mutable workloads

For key-value collections with frequent writes, RB tree is usually a pragmatic choice.

Rotation example during insertion

For inserting Key=150:

flowchart LR
    subgraph Before["Before insert"]
        direction TB
        A300["300 ⬛"]
        A200["200 🔴"]
        A100["100 ⬛"]
        A300 --> A200
        A200 --> A100
    end

    subgraph Insert["Insert 150 (Red)"]
        direction TB
        B300["300 ⬛"]
        B200["200 🔴"]
        B100["100 ⬛"]
        B150["150 🔴 <- NEW"]
        B300 --> B200
        B200 --> B100
        B100 --> B150
    end

    subgraph Rebalance["Rotate + recolor"]
        direction TB
        C200["200 ⬛"]
        C150["150 🔴"]
        C300["300 🔴"]
        C100["100 ⬛"]
        C200 --> C150
        C200 --> C300
        C150 --> C100
    end

    Before -->|"BST insert"| Insert
    Insert -->|"Fix RB rule violation"| Rebalance

Key point: rotation count is constant (usually max 2-3 per operation). Even with 1M nodes, each insert needs only a few rotations.

But each node is a separate heap object, so memory overhead is high.

Q. Is SortedDictionary.First() O(1)? No. LINQ First() moves to leftmost node through traversal from root: O(log n).

Q. Is iteration order guaranteed? Yes. foreach runs in ascending key order via in-order traversal.


Part 2: Performance Comparison

4. Time complexity comparison

OperationDictionarySortedListSortedDictionary
LookupO(1) avgO(log n)O(log n)
InsertO(1) avgO(n) (array shift)O(log n)
DeleteO(1) avgO(n) (array shift)O(log n)
Ordered iterationO(n log n) (extra sort)O(n)O(n)
Index accessN/AO(1)N/A
Min/MaxO(n)O(1)O(log n)

5. Practical benchmark tendency

Big-O is not enough for practical judgment. For around 10,000 int keys, rough trends are:

Operation (10,000)DictionarySortedListSortedDictionary
Bulk insert~0.3ms~15ms (random) / ~1ms (sorted)~3ms
Single lookup~20ns~150ns~200ns
Full iteration~0.1ms~0.05ms~0.15ms
Memory~450KB~240KB~640KB

Key observations:

  • Dictionary lookup is often 7-8x faster than SortedList
  • SortedList insertion can differ up to 15x depending on input order
  • SortedDictionary usually uses the most memory

Benchmark skeleton:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// Quick benchmark example in Unity
public void BenchmarkCollections(int count)
{
    var sw = new System.Diagnostics.Stopwatch();
    var random = new System.Random(42);
    var keys = Enumerable.Range(0, count).OrderBy(_ => random.Next()).ToArray();

    sw.Restart();
    var dict = new Dictionary<int, int>(count);
    foreach (var key in keys) dict[key] = key;
    sw.Stop();
    Debug.Log($"Dictionary Insert: {sw.Elapsed.TotalMilliseconds:F3}ms");

    sw.Restart();
    var sortedList = new SortedList<int, int>(count);
    foreach (var key in keys) sortedList[key] = key;
    sw.Stop();
    Debug.Log($"SortedList Insert (random): {sw.Elapsed.TotalMilliseconds:F3}ms");

    sw.Restart();
    var sortedDict = new SortedDictionary<int, int>();
    foreach (var key in keys) sortedDict[key] = key;
    sw.Stop();
    Debug.Log($"SortedDictionary Insert: {sw.Elapsed.TotalMilliseconds:F3}ms");
}

6. Memory usage comparison

StructureInternal layoutOverhead per elementMemory characteristic
Dictionarybuckets[] + entries[]~28 bytes (Entry struct)contiguous arrays + some empty slots
SortedListkeys[] + values[]~16 bytescontiguous arrays, very cache-friendly
SortedDictionaryTreeNode objects~48+ bytesscattered heap objects, high GC tracking cost
flowchart TB
    subgraph Memory["Memory layout comparison"]
        subgraph Dict["Dictionary"]
            DM["entries[]: [E0][E1][E2][E3][__][__]<br/>buckets[]: [2][-1][0][3][-1][-1][-1]<br/>contiguous arrays, some empty slots"]
        end
        subgraph SL["SortedList"]
            SM["keys[]:   [K0][K1][K2][K3]<br/>values[]: [V0][V1][V2][V3]<br/>dense contiguous arrays, cache-friendly"]
        end
        subgraph SD["SortedDictionary"]
            SDM["[Node] -> [Node] -> [Node] -> ...<br/>each node is independent heap object<br/>scattered memory, higher GC pressure"]
        end
    end

Unity GC perspective: GC spikes cause frame drops. SortedDictionary creates one heap object per node. 10,000 elements means 10,000 GC-tracked objects. SortedList and Dictionary usually stay at a constant number of tracked containers (mainly arrays).


Part 3: Practical Usage

7. Selection guide by game scenario

flowchart LR
    subgraph Decision["Data structure decision flow"]
        Q1{"Need sorted order?"}
        Q2{"Frequent inserts/deletes?"}
        Q3{"Need index-based access?"}

        Q1 -->|No| D["Dictionary<br/>(default for most cases)"]
        Q1 -->|Yes| Q2
        Q2 -->|Yes| SD["SortedDictionary"]
        Q2 -->|No, mostly read| Q3
        Q3 -->|Yes| SL["SortedList"]
        Q3 -->|No| SD2["SortedDictionary<br/>(safe default)"]
    end

Master data lookup -> Dictionary

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
private Dictionary<int, MonsterMasterData> monsterTable;

public void LoadMasterData(IReadOnlyList<MonsterMasterData> rawData)
{
    monsterTable = new Dictionary<int, MonsterMasterData>(rawData.Count);

    foreach (var data in rawData)
    {
        monsterTable[data.ID] = data;
    }
}

public MonsterMasterData GetMonster(int monsterID)
{
    return monsterTable.TryGetValue(monsterID, out var data) ? data : null;
}

Ranking / leaderboard -> SortedList (handle ties carefully)

SortedList keys must be unique. If score alone is key, ties overwrite.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// BAD: ties overwrite previous player
// var ranking = new SortedList<int, string>();
// ranking[100] = "PlayerA";
// ranking[100] = "PlayerB"; // PlayerA disappears

// GOOD: composite key for uniqueness
private SortedList<(int score, int negID), string> ranking
    = new SortedList<(int score, int negID), string>();

public void AddToRanking(int score, int playerID, string playerName)
{
    ranking[(score, -playerID)] = playerName;
}

public IEnumerable<(int score, string name)> GetTopRankers(int count)
{
    int total = ranking.Count;
    for (int i = total - 1; i >= Math.Max(0, total - count); i--)
    {
        yield return (ranking.Keys[i].score, ranking.Values[i]);
    }
}

Realtime event timeline -> SortedDictionary

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
private SortedDictionary<(float time, int id), GameEvent> eventTimeline
    = new SortedDictionary<(float time, int id), GameEvent>();
private int nextEventID;

public void ScheduleEvent(float time, GameEvent evt)
{
    eventTimeline[(time, nextEventID++)] = evt;
}

public void ProcessEvents(float currentTime)
{
    while (eventTimeline.Count > 0)
    {
        var first = eventTimeline.First();
        if (first.Key.time > currentTime) break;

        first.Value.Execute();
        eventTimeline.Remove(first.Key);
    }
}

8. Practical tips and cautions

Set Dictionary initial capacity

1
2
3
4
5
6
7
8
9
// BAD: repeated resize
var dict = new Dictionary<int, string>();
for (int i = 0; i < 10000; i++)
    dict.Add(i, $"item_{i}");

// GOOD: avoid resize
var dict = new Dictionary<int, string>(10000);
for (int i = 0; i < 10000; i++)
    dict.Add(i, $"item_{i}");

Even rough capacity estimation reduces resize frequency significantly. .NET 6+ also provides EnsureCapacity(int capacity).

enum key caution (Unity Mono)

1
2
3
4
5
6
7
8
9
10
11
// BAD: boxing with enum key can happen in Unity Mono
var dict = new Dictionary<MyEnum, string>();

// GOOD: custom comparer to avoid boxing
public struct MyEnumComparer : IEqualityComparer<MyEnum>
{
    public bool Equals(MyEnum x, MyEnum y) => x == y;
    public int GetHashCode(MyEnum obj) => (int)obj;
}

var dict = new Dictionary<MyEnum, string>(new MyEnumComparer());

Q. Why boxing on enum key? Unity Mono may use comparer paths that box enum values in certain equality operations. .NET Core 2.1+ JIT often optimizes this better.

Q. TryGetValue vs ContainsKey + indexer? Prefer TryGetValue always.

1
2
3
4
5
6
7
// BAD: hash/lookup twice
if (dict.ContainsKey(key))
    var value = dict[key];

// GOOD: one lookup
if (dict.TryGetValue(key, out var value))
    DoSomething(value);

Optimize bulk insert for SortedList

1
2
3
4
5
6
7
8
9
10
// BAD: random insert order -> repeated O(n) shifts
var sortedList = new SortedList<int, string>();
foreach (var item in unsortedItems)
    sortedList.Add(item.Key, item.Value);

// GOOD: pre-sort then insert
var sorted = unsortedItems.OrderBy(x => x.Key).ToArray();
var sortedList = new SortedList<int, string>(sorted.Length);
foreach (var item in sorted)
    sortedList.Add(item.Key, item.Value);

9. Extensions: ConcurrentDictionary and read-only exposure

ConcurrentDictionary - thread-safe access

1
2
3
4
5
6
7
// BAD: Dictionary is not thread-safe for concurrent writes
private Dictionary<int, CachedData> cache = new();

// GOOD: use ConcurrentDictionary for multi-threaded access
private ConcurrentDictionary<int, CachedData> cache = new();

var data = cache.GetOrAdd(key, k => LoadData(k));

ConcurrentDictionary has overhead. Do not use it in single-thread-only contexts.

ReadOnlyDictionary - safe external exposure

1
2
3
4
5
6
7
8
9
10
11
12
13
private Dictionary<int, MonsterData> monsterTable;
private ReadOnlyDictionary<int, MonsterData> readOnlyTable;

public void LoadMasterData(IReadOnlyList<MonsterData> rawData)
{
    monsterTable = new Dictionary<int, MonsterData>(rawData.Count);
    foreach (var data in rawData)
        monsterTable[data.ID] = data;

    readOnlyTable = new ReadOnlyDictionary<int, MonsterData>(monsterTable);
}

public IReadOnlyDictionary<int, MonsterData> MonsterTable => readOnlyTable;

10. Final comparison summary

CriterionDictionarySortedListSortedDictionary
Internal structureHash Table (buckets[] + entries[])Sorted arrays (keys[] + values[])Red-Black Tree
Keeps sorted orderNoYesYes
LookupO(1)O(log n)O(log n)
Insert/DeleteO(1)O(n)O(log n)
Index accessNoO(1)No
Min/MaxO(n)O(1)O(log n)
Memory efficiencyMediumGoodPoor
GC burdenLowLowestHigh
Best scenarioFast ID-based lookupSorted read-heavy data + index accessSorted data with frequent insert/delete
Game examplemaster tables, cacherankings, mostly fixed tablesevent scheduler, timeline

Checklist

Use this in code reviews:

[✅] Did you set proper initial capacity for Dictionary?

[✅] Are Dictionary keys immutable and contract-safe for GetHashCode/Equals?

[✅] Are you using TryGetValue instead of ContainsKey + indexer?

[✅] Are you avoiding SortedList/SortedDictionary when sorting is unnecessary?

[✅] For bulk SortedList insert, are items inserted in pre-sorted order?

[✅] In Unity Mono, did you consider custom IEqualityComparer<T> for enum keys?

[✅] For sorted collections, did you handle key uniqueness (composite key if needed)?

[✅] If multi-threaded access is required, did you use ConcurrentDictionary?

[✅] Is external exposure read-only via IReadOnlyDictionary?


Conclusion

Choosing data structures is not about “which is best overall” but which best fits the use case.

  • Dictionary: hash table with buckets[] + entries[]; chaining via next inside entries preserves locality. Prime-based resizing helps distribution. Best default for unsorted key-value use.
  • SortedList: sorted keys[] + values[]; memory-efficient and supports index access, but mid insertion is O(n) because of array shifts.
  • SortedDictionary: red-black tree; good for frequent sorted inserts/deletes, but per-node heap allocation means higher GC pressure in Unity.

Once you understand internals, you can quickly reason about why something is slow or memory-heavy. Like rendering pipeline knowledge helps graphics optimization, data-structure internals are the foundation for sound gameplay-system design.

This post is licensed under CC BY 4.0 by the author.