Wednesday, April 1, 2009

How to Speed Up Finding an Element in a System.Collection.IList

I've been working on a program for some time now that I inherited. It's a great program, most of the code is designed well, but some of the implementations of the design have been less than desirable, especially when you are talking about speed. Let me explain:

I've recently found a bottleneck in some of the code I've been optimizing. For instance, I have a System.Collection.IList that is basically given to me to use (since it has been implemented using a third party tool, the trade-off for removing it was not an option). I don't recall what the IList was implemented as, but the IList does not allow me to call a List.Contains(object value), or List.IndexOf(object value). If it did, I would not be in the situation I am. The bottom line is that I needed to have an implementation of the .Contains() and .IndexOf() methods. So here's how it was implemented:

public bool Contains(Guid _reviewAnswerId)
{
int listCount = myList.Count;
for (int i = 0; i < listCount; i++)
{
if (myList[i].ReviewAnswerId.Equals(_reviewAnswerId))
return true
}
return false;
}


public int IndexOf(Guid _reviewAnswerId)
{
int listCount = myList.Count;
for (int i = 0; i < listCount; i++)
{
if (myList[i].ReviewAnswerId.Equals(_reviewAnswerId))
return i;
}
return -1;
}


As you can see from the code above, the way you find a match is to loop through each object (I could have used a foreach statement, but I was trying to optimize and a for loop helped slightly) and match the value. If a match is found then "true" is returned and if not then "false" is returned for the Contains() method. For the IndexOf() method, you are looking for the index where the object resides in the IList, so you can reference it in some other place of the code.

So this is all fine and dandy if your collection is garanteed to stay small (which was the intention of the collection at design time). This method of search is a linear search, O(n). But you'll see that its actually O(n^3). Since then, the design has been extended to allow the collection to contain upwards of 5000 objects (which really is not that many, so I thought). It turns out, that these objects have multiple data elements, and everytime the object was referenced it would take a very long time to check (I found it by pausing the debugger when the program was halted for the time it was performing this routine). This looping and checking would not be performed once, but between 2000 to 4000 times, and that would be performed 400 to 800 times, so the number of times the list is accessed would be more like 800 * 4000 * 5000 = 16,000,000,000 billion times. OUCH!

So there are a few things that I did to fix the search time. First of all, where I could, I pulled all the direct calls to the object ("List[i]") into a local variable to reduce the overhead of those calls (though this did not do much). Second, I implemented a System.Collections.Hashtable that was pretty much a clone of the IList (a clone of the objects and the number of objects, not the order they are stored). Hashtables are typically O(1). Which means that you'll find you value a heck of a lot faster than a linear search. So each time I add an object to the IList, I add the same object to the Hashtable, and the same goes for removing objects. I then changed the Contains() method to the following:

public bool Contains(Guid _reviewAnswerId)
{
return _hashTable.ContainsKey(_reviewAnswerId);
}


The IndexOf() method then goes away, since you can then reference the object in the following manner:

return (CastToMyObject)(_hashTable[_reviewAnswerId]);

BINGO! Problem solved.

No comments:

Post a Comment