功能提高速度:将整数转换为32位整数列表

我正在寻找快速替代我的功能。 目标是根据任意长度的整数制作32位整数列表。 长度明确地以(值,比特长度)的元组给出。 这是一个异步接口的一个bit-banging过程的一部分,每个总线事务处理需要4个32位整数。

所有整数都是无符号,正整数或零,长度可以在0到2000之间变化

我的输入是这些元组的列表,输出应该是隐式32位长度的整数,并且这些位按顺序排列。 其余的位不适合32也应该返回。

input: [(0,128),(1,12),(0,32)]
output:[0, 0, 0, 0, 0x100000], 0, 12

我花了一两天的时间来分析cProfile,并尝试了不同的方法,但是我似乎被某种函数卡住了,这些函数在一秒内需要大约100k个元组,这很慢。 理想情况下,我想要一个10倍的加速,但我没有足够的经验知道从哪里开始。 速度的最终目标是比每秒4M元组更快。

感谢您的任何帮助或建议。

我能做的最快的是:

def foo(tuples):
    '''make a list of tuples of (int, length) into a list of 32 bit integers [1,2,3]'''
    length = 0
    remlen = 0
    remint = 0
    i32list = []
    for a, b in tuples:
        n = (remint << (32-remlen)) | a #n = (a << (remlen)) | remint
        length += b
        if length > 32:
            len32 = int(length/32)
            for i in range(len32):
                i32list.append((n >> i*32) & 0xFFFFFFFF)
            remint = n >> (len32*32)
            remlen = length - len32*32
            length = remlen
        elif length == 32:
            appint = n & 0xFFFFFFFF
            remint = 0
            remlen = 0
            length -= 32
            i32list.append(appint)
        else:
            remint = n
            remlen = length
    return i32list, remint, remlen

这有非常相似的表现:

def tpli_2_32ili(tuples):
    '''make a list of tuples of (int, length) into a list of 32 bit integers [1,2,3]'''
#    binarylist = "".join([np.binary_repr(a, b) for a, b in inp]) # bin(a)[2:].rjust(b, '0')
    binarylist = "".join([bin(a)[2:].rjust(b, '0') for a, b in tuples])
    totallength = len(binarylist)
    tot32 = int(totallength/32)
    i32list = [int(binarylist[i:i+32],2) for i in range(0, tot32*32, 32) ]
    remlen = totallength - tot32*32
    remint = int(binarylist[-remlen:],2)
    return i32list, remint, remlen

到目前为止我能想到的最好的方法是加速25%

from functools import reduce

intMask = 0xffffffff

def f(x,y):
    return (x[0] << y[1]) + y[0], x[1] + y[1]

def jens(input):
    n, length = reduce( f , input, (0,0) )
    remainderBits = length % 32
    intBits = length - remainderBits
    remainder = ((n & intMask) << (32 - remainderBits)) >> (32 - remainderBits)
    n >>= remainderBits

    ints = [n & (intMask << i) for i in range(intBits-32, -32, -32)]
    return ints, remainderBits, remainder

print([hex(x) for x in jens([(0,128),(1,12),(0,32)])[0]])

它使用long来根据它们的位长度来总结元组的值,然后从这个数中提取32位值和其余位。 总体长度的计算(总结输入元组的长度值)和大值的计算是在一个单一的循环中完成的,减少使用一个内部循环。

运行matineau的基准线束打印,我看到的最好的数字是:

Fastest to slowest execution speeds using Python 3.6.5
(1,000 executions, best of 3 repetitions)

          jens :  0.004151 secs, rel speed  1.00x,     0.00% slower
 First snippet :  0.005259 secs, rel speed  1.27x,    26.70% slower
Second snippet :  0.008328 secs, rel speed  2.01x,   100.64% slower

如果您使用一些实现位数组的C扩展,您可能会获得更好的加速。


这不是快速实施的答案。 相反,它是您在问题中的两个代码片段中的代码放置在可扩展的基准测试框架内,这使得比较不同的方法变得非常简单。

只比较这两个测试用例,它表明您的第二种方法第一种方法的性能相似,根据显示的输出结果。 事实上,它几乎是速度的两倍。

from collections import namedtuple
import sys
from textwrap import dedent
import timeit
import traceback

N = 1000  # Number of executions of each "algorithm".
R = 3  # Number of repetitions of those N executions.

# Common setup for all testcases (executed before any algorithm specific setup).
COMMON_SETUP = dedent("""
    # Import any resources needed defined in outer benchmarking script.
    #from __main__ import ??? # Not needed at this time
""")


class TestCase(namedtuple('CodeFragments', ['setup', 'test'])):
    """ A test case is composed of separate setup and test code fragments. """
    def __new__(cls, setup, test):
        """ Dedent code fragment in each string argument. """
        return tuple.__new__(cls, (dedent(setup), dedent(test)))


testcases = {
    "First snippet": TestCase("""
        def foo(tuples):
            '''make a list of tuples of (int, length) into a list of 32 bit integers [1,2,3]'''
            length = 0
            remlen = 0
            remint = 0
            i32list = []
            for a, b in tuples:
                n = (remint << (32-remlen)) | a #n = (a << (remlen)) | remint
                length += b
                if length > 32:
                    len32 = int(length/32)
                    for i in range(len32):
                        i32list.append((n >> i*32) & 0xFFFFFFFF)
                    remint = n >> (len32*32)
                    remlen = length - len32*32
                    length = remlen
                elif length == 32:
                    appint = n & 0xFFFFFFFF
                    remint = 0
                    remlen = 0
                    length -= 32
                    i32list.append(appint)
                else:
                    remint = n
                    remlen = length

            return i32list, remint, remlen
        """, """
        foo([(0,128),(1,12),(0,32)])
        """

    ),
    "Second snippet": TestCase("""
        def tpli_2_32ili(tuples):
            '''make a list of tuples of (int, length) into a list of 32 bit integers [1,2,3]'''
            binarylist = "".join([bin(a)[2:].rjust(b, '0') for a, b in tuples])
            totallength = len(binarylist)
            tot32 = int(totallength/32)
            i32list = [int(binarylist[i:i+32],2) for i in range(0, tot32*32, 32) ]
            remlen = totallength - tot32*32
            remint = int(binarylist[-remlen:],2)
            return i32list, remint, remlen
        """, """
        tpli_2_32ili([(0,128),(1,12),(0,32)])
        """
    ),
}

# Collect timing results of executing each testcase multiple times.
try:
    results = [
        (label,
         min(timeit.repeat(testcases[label].test,
                           setup=COMMON_SETUP + testcases[label].setup,
                           repeat=R, number=N)),
        ) for label in testcases
    ]
except Exception:
    traceback.print_exc(file=sys.stdout)  # direct output to stdout
    sys.exit(1)

# Display results.
major, minor, micro = sys.version_info[:3]
print('Fastest to slowest execution speeds using Python {}.{}.{}n'
      '({:,d} executions, best of {:d} repetitions)'.format(major, minor, micro, N, R))
print()

longest = max(len(result[0]) for result in results)  # length of longest label
ranked = sorted(results, key=lambda t: t[1]) # ascending sort by execution time
fastest = ranked[0][1]
for result in ranked:
    print('{:>{width}} : {:9.6f} secs, rel speed {:5,.2f}x, {:8,.2f}% slower '
          ''.format(
                result[0], result[1], round(result[1]/fastest, 2),
                round((result[1]/fastest - 1) * 100, 2),
                width=longest))

输出:

Fastest to slowest execution speeds using Python 3.6.5
(1,000 executions, best of 3 repetitions)

 First snippet :  0.003024 secs, rel speed  1.00x,     0.00% slower
Second snippet :  0.005085 secs, rel speed  1.68x,    68.13% slower
链接地址: http://www.djcxy.com/p/839.html

上一篇: Function speed improvement: Convert ints to list of 32bit ints

下一篇: Viewing Unpushed Git Commits