Vector vs ArrayList performance

Everybody's saying that one should use vector because of the perfomance (cause Vector synchronizes after every operation and stuff). I've written a simple test:

import java.util.ArrayList;
import java.util.Date;
import java.util.Vector;

public class ComparePerformance {

    public static void main(String[] args) {
        ArrayList<Integer> list = new ArrayList<Integer>();
        Vector<Integer> vector = new Vector<Integer>();

        int size = 10000000;
        int listSum = 0;
        int vectorSum = 0;

        long startList = new Date().getTime();
        for (int i = 0; i < size; i++) {
            list.add(new Integer(1));
        }
        for (Integer integer : list) {
            listSum += integer;
        }
        long endList = new Date().getTime();
        System.out.println("List time: " + (endList - startList));

        long startVector = new Date().getTime();
        for (int i = 0; i < size; i++) {
            vector.add(new Integer(1));
        }
        for (Integer integer : list) {
            vectorSum += integer;
        }
        long endVector = new Date().getTime();
        System.out.println("Vector time: " + (endVector - startVector));
    }
}

The results are as follows:

List time: 4360
Vector time: 4103

Based on this it seems that Vector perfomance at iterating over and reading is slightly better. Maybe this is a dumb queston or I've made wrong assumptions - can somebody please explan this?


You have written a naïve microbenchmark. Microbenchmarking on the JVM is very tricky business and it is not even easy to enumerate all the pitfalls, but here are some classic ones:

  • you must warm up the code;
  • you must control for garbage collection pauses;
  • System.currentTimeMillis is imprecise, but you don't seem to be aware of even this method (your new Date().getTime() is equivalent, but slower).
  • If you want to do this properly, then check out Oracle's jmh tool or Google's Caliper.

    My Test Results

    Since I was kind of interested to see these numbers myself, here is the output of jmh . First, the test code:

    public class Benchmark1
    {
      static Integer[] ints = new Integer[0];
      static {
        final List<Integer> list = new ArrayList(asList(1,2,3,4,5,6,7,8,9,10));
        for (int i = 0; i < 5; i++) list.addAll(list);
        ints = list.toArray(ints);
      }
      static List<Integer> intList = Arrays.asList(ints);
      static Vector<Integer> vec = new Vector<Integer>(intList);
      static List<Integer> list = new ArrayList<Integer>(intList);
    
      @GenerateMicroBenchmark
      public Vector<Integer> testVectorAdd() {
        final Vector<Integer> v = new Vector<Integer>();
        for (Integer i : ints) v.add(i);
        return v;
      }
      @GenerateMicroBenchmark
      public long testVectorTraverse() {
        long sum = (long)Math.random()*10;
        for (int i = 0; i < vec.size(); i++) sum += vec.get(i);
        return sum;
      }
      @GenerateMicroBenchmark
      public List<Integer> testArrayListAdd() {
        final List<Integer> l = new ArrayList<Integer>();
        for (Integer i : ints) l.add(i);
        return l;
      }
      @GenerateMicroBenchmark
      public long testArrayListTraverse() {
        long sum = (long)Math.random()*10;
        for (int i = 0; i < list.size(); i++) sum += list.get(i);
        return sum;
      }
    }
    

    And the results:

    testArrayListAdd          234.896  ops/msec
    testVectorAdd             274.886  ops/msec
    testArrayListTraverse    1718.711  ops/msec
    testVectorTraverse         34.843  ops/msec
    

    Note the following:

  • in the ...add methods I am creating a new, local collection. The JIT compiler uses this fact and elides the locking on Vector methods—hence almost equal performance;
  • in the ...traverse methods I am reading from a global collection; the locks cannot be elided and this is where the true performance penalty of Vector shows up.
  • The main takeaway from this should be: the performance model on the JVM is highly complex, sometimes even erratic. Extrapolating from microbenchmarks, even when they are done with all due care, can lead to dangerously wrong predictions about production system performance.


    I agree with Marko about using Caliper, it's an awesome framework.

    But you can get a part of it done yourself if you organize your benchmark a bit better:

    public class ComparePerformance {
    
        private static final int SIZE = 1000000;
        private static final int RUNS = 500;
        private static final Integer ONE = Integer.valueOf(1);
    
        static class Run {
            private final List<Integer> list;
    
            Run(final List<Integer> list) {
                this.list = list;
            }
    
            public long perform() {
                long oldNanos = System.nanoTime();
                for (int i = 0; i < SIZE; i++) {
                    list.add(ONE);
                }
    
                return System.nanoTime() - oldNanos;
            }
        }
    
        public static void main(final String[] args) {
    
            long arrayListTotal = 0L;
            long vectorTotal = 0L;
            for (int i = 0; i < RUNS; i++) {
                if (i % 50 == 49) {
                    System.out.println("Run " + (i + 1));
                }
    
                arrayListTotal += new Run(new ArrayList<Integer>()).perform();
                vectorTotal += new Run(new Vector<Integer>()).perform();
            }
    
            System.out.println();
    
    
            System.out.println("Runs: "+RUNS+", list size: "+SIZE);
            output(arrayListTotal, "List");
            output(vectorTotal, "Vector");
        }
    
        private static void output(final long value, final String name) {
            System.out.println(name + " total time: " + value + " (" + TimeUnit.NANOSECONDS.toMillis(value) + " " + "ms)");
    
            long avg = value / RUNS;
            System.out.println(name + " average time: " + avg + " (" + TimeUnit.NANOSECONDS.toMillis(avg) + " " + "ms)");
        }
    }
    

    The key part is running your code, often. Also, remove stuff that's unrelated to your benchmark. Re-use Integers instead of creating new ones.

    The above benchmark code creates this output on my machine:

    Runs: 500, list size: 1000000
    List total time: 3524708559 (3524 ms)
    List average time: 7049417 (7 ms)
    Vector total time: 6459070419 (6459 ms)
    Vector average time: 12918140 (12 ms)
    

    I'd say that should give you an idea of the performance differences.


    As Marko Topolnik said, it is hard to write correct microbenchmarks and to interprete the results correctly. There are good articles about this subject availible.

    From my experience and what I know of the implementation I use this rule of thumb:

  • Use ArrayList
  • If the collection must be synchronized consider the usage of vector. (I never end up using it, because there are other solutions for synchronization, concurrency and parallel programming)
  • If there are many elements in the collection and there are frequent insert or remove operations inside the list (not at the end) then use LinkedList
  • Most collections do not contain many elements and it would be a waste of time to spend more effort to them. Also in scala there are parallel collections, which perform some operations in parallel. Maybe there is something available for use in pure Java, too.

    Whenever possible use the List interface to hide implementation details and try to add comments which show your reasons WHY you've chosen a specific implementation.

    链接地址: http://www.djcxy.com/p/76102.html

    上一篇: Java应该将数组视为对象吗?

    下一篇: Vector与ArrayList的性能