How can I make my simple .NET LRU cache faster?

2018-06-14 22:07:19

UPDATE: Hey guys thanks for the replies. Last night and tonight I tried a few different approaches and came up with one similar to the one laid out below by Jeff (I had even already done what he suggested in his update, and put together my own simple LL implementation for additional gains). Here is the code, at this point it doesn't look particularily clean anymore, but I have been over this numerous times changing anything I could to beef up performance.

public class NewLRU2<K, V> where V : class
{
    int m_iMaxItems;
    Dictionary<K, LRUNode<K, V>> m_oMainDict;

    private LRUNode<K,V> m_oHead;
    private LRUNode<K,V> m_oTail;
    private LRUNode<K,V> m_oCurrent;

    public NewLRU2(int iSize)
    {
        m_iMaxItems = iSize;
        m_oMainDict = new Dictionary<K, LRUNode<K,V>>();

        m_oHead = null;
        m_oTail = null;
    }

    public V this[K key]
    {
        get
        {
            m_oCurrent = m_oMainDict[key];

            if (m_oCurrent == m_oHead)
            {
                //do nothing
            }
            else if (m_oCurrent == m_oTail)
            {
                m_oTail = m_oCurrent.Next;
                m_oTail.Prev = null;

                m_oHead.Next = m_oCurrent;
                m_oCurrent.Prev = m_oHead;
                m_oCurrent.Next = null;
                m_oHead = m_oCurrent;
            }
            else
            {
                m_oCurrent.Prev.Next = m_oCurrent.Next;
                m_oCurrent.Next.Prev = m_oCurrent.Prev;

                m_oHead.Next = m_oCurrent;
                m_oCurrent.Prev = m_oHead;
                m_oCurrent.Next = null;
                m_oHead = m_oCurrent;
            }

            return m_oCurrent.Value;
        }
    }

    public void Add(K key, V value)
    {
        if (m_oMainDict.Count >= m_iMaxItems)
        {   
            //remove old
            m_oMainDict.Remove(m_oTail.Key);

            //reuse old
            LRUNode<K, V> oNewNode = m_oTail;
            oNewNode.Key = key;
            oNewNode.Value = value;

            m_oTail = m_oTail.Next;
            m_oTail.Prev = null;

            //add new
            m_oHead.Next = oNewNode;
            oNewNode.Prev = m_oHead;
            oNewNode.Next = null;
            m_oHead = oNewNode;
            m_oMainDict.Add(key, oNewNode);
        }
        else
        {
            LRUNode<K, V> oNewNode = new LRUNode<K, V>(key, value);
            if (m_oHead == null)
            {
                m_oHead = oNewNode;
                m_oTail = oNewNode;
            }
            else
            {
                m_oHead.Next = oNewNode;
                oNewNode.Prev = m_oHead;
                m_oHead = oNewNode;
            }
            m_oMainDict.Add(key, oNewNode);
        }
    }

    public bool Contains(K key)
    {
        return m_oMainDict.ContainsKey(key);
    }
}


internal class LRUNode<K,V>
{
    public LRUNode(K key, V val)
    {
        Key = key;
        Value = val;
    }

    public K Key;
    public V Value;
    public LRUNode<K, V> Next;
    public LRUNode<K, V> Prev;
}

There are a few parts that look/feel wonky -- like reusing the old node when doing an add -- but I was able to get an appreciable boost in porformance out of them. I was also slightly surprised at the difference it made to switch from actual properties on the node to just public variables, but I guess that's how it goes with this stuff. At this point the code above is almost entirely performance-limited by the dictionary operations, so I'm not sure I'd get a lot more out of mashing it around. I'll continue to think on it and look into some of the responses.

Explanation From Original Post: Hello all. So I've written a simple lightweight LRU implementation for use in a compression library (I'm using it to find matching byte-strings in the input based on a hash, LZW-style), and I'm looking for ways to make it faster.

UPDATE #2

This reduces the need for list traversal on a linked list Remove. It introduces a LruCacheNode that has both the key and the value. The key is only used when you trim the cache. You could get better performance if you wrote your own linked list implementation where each node essentially is an LruCacheNode along with a Next and Back reference. This is sort of what a LinkedHashMap does (see these two questions).

public class LruCache<K, V>
{
    private readonly int m_iMaxItems;
    private readonly Dictionary<K, LinkedListNode<LruCacheNode<K, V>>> m_oMainDict;
    private readonly LinkedList<LruCacheNode<K, V>> m_oMainList;

    public LruCache(int iSize)
    {
        m_iMaxItems = iSize;
        m_oMainDict = new Dictionary<K, LinkedListNode<LruCacheNode<K, V>>>();
        m_oMainList = new LinkedList<LruCacheNode<K, V>>();
    }

    public V this[K key]
    {
        get
        {
            return BumpToFront(key).Value;
        }
        set
        {
            BumpToFront(key).Value = value;
        }
    }

    public void Add(K key, V value)
    {
        LinkedListNode<LruCacheNode<K, V>> newNode = m_oMainList.AddFirst(new LruCacheNode<K, V>(key, value));
        m_oMainDict.Add(key, newNode);

        if (m_oMainList.Count > m_iMaxItems)
        {
            m_oMainDict.Remove(m_oMainList.Last.Value.Key);
            m_oMainList.RemoveLast();
        }
    }

    private LruCacheNode<K, V> BumpToFront(K key)
    {
        LinkedListNode<LruCacheNode<K, V>> node = m_oMainDict[key];
        if (m_oMainList.First != node)
        {
            m_oMainList.Remove(node);
            m_oMainList.AddFirst(node);
        }
        return node.Value;
    }

    public bool Contains(K key)
    {
        return m_oMainDict.ContainsKey(key);
    }
}

internal sealed class LruCacheNode<K, V>
{
    private readonly K m_Key;
    private V m_Value;

    public LruCacheNode(K key, V value)
    {
        m_Key = key;
        m_Value = value;
    }

    public K Key
    {
        get { return m_Key; }
    }

    public V Value
    {
        get { return m_Value; }
        set { m_Value = value; }
    }
}

You'll have to profile things to see if this is an improvement in your environment.

Minor Update: I updated BumpToFront to check to see if the node is already at the front per comment from Tim Stewart.

Isn't the point of a LRU cache to allow you to trim the cache and throw out the least-recently-used stuff? :-) I don't see any code to trim the cache. Since you most likely want high performance for the retrieve use-case, and the trim use-case is less important why not offload the list maintenance to the trim process?

IOW, just throw the entries into the cache, but timestamp them on retrieval. Don't reorder the entries, just mark them whenever they're used. Could be a true DateTime timestamp, or could be a simple counter in the class, highest number was most recently used. Then in the trim process just walk the whole tree and remove the entries with the oldest stamps.

With hardware caches, instead of having say 128 elements, and maintaining the order of items 1-128, you might have it as 32 x 4, so 32 rows of 4 elements each. The first 5 bits of an address would determine which of the 32 rows that address would map to, then you would search only the 4 items, and if not found replace the oldest of the 4.

This is much faster, and is IIRC within 10% of the hit rate of a 1 x 128 cache.

To translate, you would instead of one linked list, have multiple ones, so traversing them was much faster. You would have to have a way of determining which list a particular item mapped to.

The point being, as your list grows in size, you get diminishing returns from trying to maintain with perfect accuracy the exact order of each element in the list. You might even be better off with an unordered list, and randomly replacing any element when you have a cache miss. Depends on the size of your list, and the penalty for a miss vs the cost of maintaining the list.

链接地址: http://www.djcxy.com/p/42382.html

上一篇: Solaris GDB：如何暂停执行？

下一篇: 我怎样才能让我的简单.NET LRU缓存更快？