“yield”关键字有什么作用？

2018-05-28 22:23:14

Python中yield关键字的用法是什么？它有什么作用？

例如，我试图理解这个代码1 ：

def _get_child_candidates(self, distance, min_dist, max_dist):
    if self._leftchild and distance - max_dist < self._median:
        yield self._leftchild
    if self._rightchild and distance + max_dist >= self._median:
        yield self._rightchild

这是来电者：

result, candidates = [], [self]
while candidates:
    node = candidates.pop()
    distance = node._get_dist(obj)
    if distance <= max_dist and distance >= min_dist:
        result.extend(node._values)
    candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))
return result

调用方法_get_child_candidates时会发生什么？是否返回列表？单个元素？它是否再次被调用？随后的通话何时停止？

1.代码来自Jochen Schulz（jrschulz），他为度量空间创建了一个伟大的Python库。这是完整源代码的链接：模块mspace。

要理解yield是什么，你必须了解发电机是什么。并且在发电机来临之前。

Iterables

当你创建一个列表时，你可以逐个阅读它的项目。逐一读取它的项目称为迭代：

>>> mylist = [1, 2, 3]
>>> for i in mylist:
...    print(i)
1
2
3

mylist是一个可迭代的。当你使用列表理解时，你可以创建一个列表，这样一个可迭代的：

>>> mylist = [x*x for x in range(3)]
>>> for i in mylist:
...    print(i)
0
1
4

你可以使用的“ for... in... ”是可迭代的; lists ， strings ，文件...

这些迭代器很方便，因为您可以根据需要尽可能多地读取它们，但是将所有值存储在内存中，并且当您有很多值时，并不总是您想要的值。

发电机

生成器是迭代器，这是一种迭代器， 只能迭代一次 。生成器不会将所有值存储在内存中， 它们将在运行时生成值 ：

>>> mygenerator = (x*x for x in range(3))
>>> for i in mygenerator:
...    print(i)
0
1
4

除了你使用()而不是[]之外，它是一样的。但是，由于发电机只能使用一次，所以你不能在for i in mygenerator发电机中执行第二次：他们计算0，然后忘记它并计算1，并逐个结束计算4。

产量

yield是一个像return一样使用的关键字，除了该函数将返回一个生成器。

>>> def createGenerator():
...    mylist = range(3)
...    for i in mylist:
...        yield i*i
...
>>> mygenerator = createGenerator() # create a generator
>>> print(mygenerator) # mygenerator is an object!
<generator object createGenerator at 0xb7555c34>
>>> for i in mygenerator:
...     print(i)
0
1
4

这是一个无用的例子，但是当你知道你的函数将返回一大组值，你只需要读取一次就可以了。

要掌握yield ，您必须明白， 当您调用函数时，您在函数体中编写的代码不会运行。 该函数只返回生成器对象，这有点棘手:-)

然后，你的代码将在每次运行for使用发电机。

现在困难的部分：

第一次for调用您的函数创建发电机对象时，它会从一开始就运行在你的函数代码，直到它击中yield ，那么它将返回循环的第一个值。然后，每次调用都会运行您在函数中再次写入的循环，并返回下一个值，直到没有值返回。

一旦函数运行，发生器就被认为是空的，但不再yield 。这可能是因为循环已经结束，或者因为你不再满足"if/else"了。

你的代码解释

发电机：

# Here you create the method of the node object that will return the generator
def _get_child_candidates(self, distance, min_dist, max_dist):

    # Here is the code that will be called each time you use the generator object:

    # If there is still a child of the node object on its left
    # AND if distance is ok, return the next child
    if self._leftchild and distance - max_dist < self._median:
        yield self._leftchild

    # If there is still a child of the node object on its right
    # AND if distance is ok, return the next child
    if self._rightchild and distance + max_dist >= self._median:
        yield self._rightchild

    # If the function arrives here, the generator will be considered empty
    # there is no more than two values: the left and the right children

呼叫者：

# Create an empty list and a list with the current object reference
result, candidates = list(), [self]

# Loop on candidates (they contain only one element at the beginning)
while candidates:

    # Get the last candidate and remove it from the list
    node = candidates.pop()

    # Get the distance between obj and the candidate
    distance = node._get_dist(obj)

    # If distance is ok, then you can fill the result
    if distance <= max_dist and distance >= min_dist:
        result.extend(node._values)

    # Add the children of the candidate in the candidates list
    # so the loop will keep running until it will have looked
    # at all the children of the children of the children, etc. of the candidate
    candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))

return result

该代码包含几个智能部分：

循环在列表上进行迭代，但列表在循环迭代时扩展:-)这是一种简洁的方式来浏览所有这些嵌套的数据，即使它有点危险，因为可能会导致无限循环。在这种情况下， candidates.extend(node._get_child_candidates(distance, min_dist, max_dist))耗尽生成器的所有值，但while一直创建新的生成器对象，这会产生与以前不同的值，因为它不会应用于相同的节点。

extend()方法是一个列表对象方法，它需要一个迭代并将其值添加到列表中。

通常我们通过一个列表：

>>> a = [1, 2]
>>> b = [3, 4]
>>> a.extend(b)
>>> print(a)
[1, 2, 3, 4]

但是在你的代码中它会得到一个生成器，这很好，因为：

您不需要两次读取值。

你可能有很多孩子，你不希望他们都存储在内存中。

它的工作原理是因为Python不关心一个方法的参数是否是一个列表。 Python期望iterables，所以它将与字符串，列表，元组和生成器一起工作！这被称为鸭子打字，也是Python如此酷的原因之一。但这是另一个故事，另一个问题是......

你可以在这里停下来，或者稍微阅读一下，看看发电机的高级用法：

控制发电机耗尽

>>> class Bank(): # let's create a bank, building ATMs
...    crisis = False
...    def create_atm(self):
...        while not self.crisis:
...            yield "$100"
>>> hsbc = Bank() # when everything's ok the ATM gives you as much as you want
>>> corner_street_atm = hsbc.create_atm()
>>> print(corner_street_atm.next())
$100
>>> print(corner_street_atm.next())
$100
>>> print([corner_street_atm.next() for cash in range(5)])
['$100', '$100', '$100', '$100', '$100']
>>> hsbc.crisis = True # crisis is coming, no more money!
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> wall_street_atm = hsbc.create_atm() # it's even true for new ATMs
>>> print(wall_street_atm.next())
<type 'exceptions.StopIteration'>
>>> hsbc.crisis = False # trouble is, even post-crisis the ATM remains empty
>>> print(corner_street_atm.next())
<type 'exceptions.StopIteration'>
>>> brand_new_atm = hsbc.create_atm() # build a new one to get back in business
>>> for cash in brand_new_atm:
...    print cash
$100
$100
$100
$100
$100
$100
$100
$100
$100
...

注意：对于Python3，使用print(corner_street_atm.__next__())或print(next(corner_street_atm))

它可以用于控制对资源的访问等各种功能。

Itertools，你最好的朋友

itertools模块包含特殊的函数来操作迭代。是否希望复制一个生成器？链式两台发电机？用一个班轮在一个嵌套列表中分组值？ Map / Zip没有创建另一个列表？

然后只需import itertools 。

一个例子？让我们来看看四匹马比赛的可能的命令：

>>> horses = [1, 2, 3, 4]
>>> races = itertools.permutations(horses)
>>> print(races)
<itertools.permutations object at 0xb754f1dc>
>>> print(list(itertools.permutations(horses)))
[(1, 2, 3, 4),
 (1, 2, 4, 3),
 (1, 3, 2, 4),
 (1, 3, 4, 2),
 (1, 4, 2, 3),
 (1, 4, 3, 2),
 (2, 1, 3, 4),
 (2, 1, 4, 3),
 (2, 3, 1, 4),
 (2, 3, 4, 1),
 (2, 4, 1, 3),
 (2, 4, 3, 1),
 (3, 1, 2, 4),
 (3, 1, 4, 2),
 (3, 2, 1, 4),
 (3, 2, 4, 1),
 (3, 4, 1, 2),
 (3, 4, 2, 1),
 (4, 1, 2, 3),
 (4, 1, 3, 2),
 (4, 2, 1, 3),
 (4, 2, 3, 1),
 (4, 3, 1, 2),
 (4, 3, 2, 1)]

了解迭代的内在机制

迭代是一个暗示iterables（实现__iter__()方法）和迭代器（实现__next__()方法）的过程。 Iterables是可以从中获取迭代器的任何对象。迭代器是可以迭代迭代器件的对象。

关于循环如何工作的更多信息请参见本文。

快速获得高`yield`

当你看到带有yield语句的函数时，应用这个简单的技巧来了解将要发生的事情：

在函数的开头插入一行result = [] 。

用result.append(expr)替换每个yield expr 。

在函数的底部插入一个换行return result 。

耶 - 没有更多的yield声明！阅读并找出代码。

比较功能与原始定义。

这个技巧可能会让你对函数背后的逻辑有所了解，但是yield实际情况与基于列表的方法发生的情况明显不同。在很多情况下，收益率方法的记忆效率会更高，速度更快。在其他情况下，这个技巧会让你陷入无限循环，尽管原始函数工作得很好。请继续阅读以了解更多信息...

不要混淆你的迭代器，迭代器和发生器

首先， 迭代器协议 - 当你写

for x in mylist:
    ...loop body...

Python执行以下两个步骤：

获取mylist的迭代器：

调用iter(mylist) - >这将返回一个带有next()方法的对象（或Python 3中的__next__() ）。

[这是大多数人忘记告诉你的步骤]

使用迭代器遍历项目：

保持调用next()方法在迭代器从第1步从返回的返回值next()被分配给x ，并执行循环体。如果在next()引发异常StopIteration ，则意味着迭代器中没有更多值，并且退出循环。

事实是，只要Python想循环对象的内容，Python就会执行上述两个步骤 - 所以它可能是for循环，但它也可以是像otherlist.extend(mylist) （其中otherlist是Python列表）。

这里mylist是一个迭代器，因为它实现了迭代器协议。在用户定义的类中，可以实现__iter__()方法以使您的类的实例可迭代。这个方法应该返回一个迭代器。迭代器是带有next()方法的对象。可以在同一个类上实现__iter__()和next() ，并使__iter__()返回self 。这将适用于简单的情况，但不是当你想让两个迭代器同时在同一个对象上循环时。

所以这就是迭代器协议，许多对象实现这个协议：

内置列表，字典，元组，集合，文件。

实现__iter__()用户定义类。

发电机。

请注意， for循环不知道它处理的是什么类型的对象 - 它只是遵循迭代器协议，并且很乐意在调用next()获得项目。内置列表逐个返回它们的项目，字典逐个返回键，文件逐个返回行，等等。而生成器返回......那么这就是yield的地方：

def f123():
    yield 1
    yield 2
    yield 3

for item in f123():
    print item

而不是yield语句，如果你在f123()有三个return语句， f123()只有第一个语句会被执行，并且函数会退出。但f123()不是普通的函数。当f123() ，它不会返回yield语句中的任何值！它返回一个生成器对象。此外，函数并不真正退出 - 它进入暂停状态。当for循环尝试循环生成器对象时，该函数从之前返回的yield之后的下一行恢复其挂起状态，执行下一行代码（在本例中为yield语句），并将其返回为下一个项目。发生这种情况，直到函数退出，此时生成器引发StopIteration ，并退出循环。

所以生成器对象有点像适配器 - 一方面它展示了迭代器协议，通过暴露__iter__()和next()方法来保持for循环的快乐。然而，在另一端，它运行的功能足以让下一个值出来，并将其重新置于暂停模式。

为什么使用生成器？

通常你可以编写不使用生成器但实现相同逻辑的代码。一种选择是使用我之前提到的临时列表“技巧”。这在所有情况下都不起作用，例如，如果你有无限循环，或者当你有一个很长的列表时，它可能会无效地使用内存。另一种方法是实现一个新的可迭代的类SomethingIter ，它保存实例成员中的状态，并在Python 3中的next() （或__next__() ）方法中执行下一个逻辑步骤。取决于逻辑， next()方法中的代码可能最终看起来非常复杂并容易出现错误。这里的发电机提供了一个干净而简单的解

这样想一想：

迭代器只是具有next（）方法的对象的一个奇妙的声音术语。所以yield-ed函数最终会是这样的：

原始版本：

def some_function():
    for i in xrange(4):
        yield i

for i in some_function():
    print i

这基本上是python解释器用上面的代码所做的：

class it:
    def __init__(self):
        #start at -1 so that we get 0 when we add 1 below.
        self.count = -1
    #the __iter__ method will be called once by the for loop.
    #the rest of the magic happens on the object returned by this method.
    #in this case it is the object itself.
    def __iter__(self):
        return self
    #the next method will be called repeatedly by the for loop
    #until it raises StopIteration.
    def next(self):
        self.count += 1
        if self.count < 4:
            return self.count
        else:
            #a StopIteration exception is raised
            #to signal that the iterator is done.
            #This is caught implicitly by the for loop.
            raise StopIteration 

def some_func():
    return it()

for i in some_func():
    print i

为了更深入地了解幕后发生的事情，可以将for循环重写为：

iterator = some_func()
try:
    while 1:
        print iterator.next()
except StopIteration:
    pass

这是否更有意义或只是更混淆你？ :)

编辑：我应该指出，这是一个过于简单的说明目的。 :)

编辑2：忘记抛出StopIteration异常

链接地址: http://www.djcxy.com/p/15.html

上一篇: What does the "yield" keyword do?

下一篇: How do JavaScript closures work?