Fastest Way to generate 1,000,000+ random numbers in python

I am currently writing an app in python that needs to generate large amount of random numbers, FAST. Currently I have a scheme going that uses numpy to generate all of the numbers in a giant batch (about ~500,000 at a time). While this seems to be faster than python's implementation. I still need it to go faster. Any ideas? I'm open to writing it in C and embedding it in the program or doing w/e it takes.

Constraints on the random numbers:

  • A Set of 7 numbers that can all have different bounds:
  • eg: [0-X1, 0-X2, 0-X3, 0-X4, 0-X5, 0-X6, 0-X7]
  • Currently I am generating a list of 7 numbers with random values from [0-1) then multiplying by [X1..X7]
  • A Set of 13 numbers that all add up to 1
  • Currently just generating 13 numbers then dividing by their sum
  • Any ideas? Would pre calculating these numbers and storing them in a file make this faster?

    Thanks!


    You can speed things up a bit from what mtrw posted above just by doing what you initially described (generating a bunch of random numbers and multiplying and dividing accordingly)...

    Also, you probably already know this, but be sure to do the operations in-place (*=, /=, +=, etc) when working with large-ish numpy arrays. It makes a huge difference in memory usage with large arrays, and will give a considerable speed increase, too.

    In [53]: def rand_row_doubles(row_limits, num):
       ....:     ncols = len(row_limits)
       ....:     x = np.random.random((num, ncols))
       ....:     x *= row_limits                  
       ....:     return x                          
       ....:                                       
    In [59]: %timeit rand_row_doubles(np.arange(7) + 1, 1000000)
    10 loops, best of 3: 187 ms per loop
    

    As compared to:

    In [66]: %timeit ManyRandDoubles(np.arange(7) + 1, 1000000)
    1 loops, best of 3: 222 ms per loop
    

    It's not a huge difference, but if you're really worried about speed, it's something.

    Just to show that it's correct:

    In [68]: x.max(0)
    Out[68]:
    array([ 0.99999991,  1.99999971,  2.99999737,  3.99999569,  4.99999836,
            5.99999114,  6.99999738])
    
    In [69]: x.min(0)
    Out[69]:
    array([  4.02099599e-07,   4.41729377e-07,   4.33480302e-08,
             7.43497138e-06,   1.28446819e-05,   4.27614385e-07,
             1.34106753e-05])
    

    Likewise, for your "rows sum to one" part...

    In [70]: def rand_rows_sum_to_one(nrows, ncols):
       ....:     x = np.random.random((ncols, nrows))
       ....:     y = x.sum(axis=0)
       ....:     x /= y
       ....:     return x.T
       ....:
    
    In [71]: %timeit rand_rows_sum_to_one(1000000, 13)
    1 loops, best of 3: 455 ms per loop
    
    In [72]: x = rand_rows_sum_to_one(1000000, 13)
    
    In [73]: x.sum(axis=1)
    Out[73]: array([ 1.,  1.,  1., ...,  1.,  1.,  1.])
    

    Honestly, even if you re-implement things in C, I'm not sure you'll be able to beat numpy by much on this one... I could be very wrong, though!


    EDIT Created functions that return the full set of numbers, not just one row at a time. EDIT 2 Make the functions more pythonic (and faster), add solution for second question

    For the first set of numbers, you might consider numpy.random.randint or numpy.random.uniform , which take low and high parameters. Generating an array of 7 x 1,000,000 numbers in a specified range seems to take < 0.7 second on my 2 GHz machine:

    def LimitedRandInts(XLim, N):
        rowlen = (1,N)
        return [np.random.randint(low=0,high=lim,size=rowlen) for lim in XLim]
    
    def LimitedRandDoubles(XLim, N):
        rowlen = (1,N)
        return [np.random.uniform(low=0,high=lim,size=rowlen) for lim in XLim]
    
    >>> import numpy as np
    >>> N = 1000000 #number of randoms in each range
    >>> xLim = [x*500 for x in range(1,8)] #convenient limit generation
    >>> fLim = [x/7.0 for x in range(1,8)]
    >>> aa = LimitedRandInts(xLim, N)
    >>> ff = LimitedRandDoubles(fLim, N)
    

    This returns integers in [0,xLim-1] or floats in [0,fLim). The integer version took ~0.3 seconds, the double ~0.66, on my 2 GHz single-core machine.

    For the second set, I used @Joe Kingston's suggestion.

    def SumToOneRands(NumToSum, N):
        aa = np.random.uniform(low=0,high=1.0,size=(NumToSum,N)) #13 rows by 1000000 columns, for instance
        s = np.reciprocal(aa.sum(0))
        aa *= s
        return aa.T #get back to column major order, so aa[k] is the kth set of 13 numbers
    
    >>> ll = SumToOneRands(13, N)
    

    This takes ~1.6 seconds.

    In all cases, result[k] gives you the kth set of data.


    Try r = 1664525*r + 1013904223
    from "an even quicker generator" in "Numerical Recipes in C" 2nd edition, Press et al., isbn 0521431085, p. 284.
    np.random is certainly "more random"; see Linear congruential generator .

    In python, use np.uint32 like this:

    python -mtimeit -s '
    import numpy as np
    r = 1
    r = np.array([r], np.uint32)[0]  # 316 py -> 16 us np 
        # python longs can be arbitrarily long, so slow
    ' '
    r = r*1664525 + 1013904223  # NR2 p. 284
    '
    
    链接地址: http://www.djcxy.com/p/96730.html

    上一篇: 如何获得浮动范围之间的随机数字?

    下一篇: 在Python中生成1,000,000个以上随机数的最快方法