python xml查询获取父项

2018-06-10 02:12:44

我有一个很大的XML文件，看起来像这样：

<Node name="foo">
    <Node name="16764764625">
        <Val name="type"><s>3</s></Val>
        <Val name="owner"><s>1</s></Val>
        <Val name="location"><s>4</s></Val>
        <Val name="brb"><n/></Val>
        <Val name="number"><f>24856</f></Val>
        <Val name="number2"><f>97000.0</f></Val>
    </Node>
    <Node name="1764466544">
        <Val name="type"><s>1</s></Val>
        <Val name="owner"><s>2</s></Val>
        <Val name="location"><s>6</s></Val>
        <Val name="brb"><n/></Val>
        <Val name="number"><f>265456</f></Val>
        <Val name="number2"><f>99000.0</f></Val>
    </Node>
    ...
</Node>

我的任务是通过搜索来查找节点Val name =“number”的子元素是否包含265456（父节点的值）：1764466544（第二个节点中的名称的值）

我一直在做一堆关于XPath和ElementTree的阅读，但我仍然不确定从哪里开始实际查询。寻找例子...我找不到任何引用父节点的结果。

仍然是新的蟒蛇..任何建议，将不胜感激。

谢谢

这个XPath：

/Node/Node[Val[@name='number']/f='265456']/@name

输出：

1764466544

不幸的是，当使用ElementTree API时，每个Element对象都没有引用返回到它的父对象，所以你不能从已知点上去树。相反，你必须找到可能的父对象并过滤你想要的对象。

这通常用XPath表达式来完成。但是，ElementTree只支持XPath的一个子集（请参阅文档），其中最有用的部分仅在ElementTree 1.3中添加，该元素仅附带Python 2.7+或3.2+。

而且，即使ElementTree的XPath不能与你的文件一起工作 - 没有办法根据节点的文本，只选择其属性（或属性值）进行选择。

我的实验只找到了两种方法可以继续使用ElementTree。如果您使用的是Python 2.7+（或者能够下载并安装更新版本的ElementTree以使用较旧的Python版本），并且您可以修改XML文件的格式以将数字作为属性，如

<Val name="number"><f val="265456" /></Val>

那么下面的Python代码将会提取出感兴趣的节点：

import xml.etree.ElementTree as ETree
tree = ETree.ElementTree(file='sample.xml')
nodes = tree.findall(".//Node/Val[@name='number']/f[@val='265456']....")

对于较老的Pythons，或者如果您无法修改XML格式，则必须手动过滤无效节点。以下为我工作：

import xml.etree.ElementTree as ETree
tree = ETree.ElementTree(file='sample.xml')
all = tree.findall(".//Node")
nodes = []

# Filter matching nodes and put them in the nodes variable.
for node in all:
    for val in node.getchildren():
        if val.attrib['name'] == 'number' and val.getchildren()[0].text =='265456':
            nodes.append(node)

这些解决方案都不是我称之为理想的解决方案，但是它们是我能够使用ElementTree库进行工作的唯一解决方案（因为这是您提到的使用方法）。使用第三方库可能会更好，而不是使用内置的库; 请参阅XML上的Python wiki条目以获取选项列表。 lxml是广泛使用的libxml2库的Python绑定，并且会是我首先建议查看的那个。它具有XPath支持，因此您应该能够使用来自其他答案的查询。

以下功能帮助我解决了类似的情况。正如文档字符串所解释的，它在一般情况下不起作用，但如果节点是唯一的，它应该有所帮助。

def get_element_ancestry(root, element):
'''Return a list of ancestor Elements for the given element.

If both root and element are of type xml.etree.ElementTree.Element, and if
the given root contains the given element as a descendent, then return a
list of direct xml.etree.ElementTree.Element ancestors, starting with root
and ending with element. Otherwise, return an empty list.

The xml.etree.ElementTree module offers no function to return the parent of
a given Element, presumably because an Element may be in more than one tree,
or even multiple times within a given tree, so its parent depends on the
context. This function provides a solution in the specific cases where the
caller either knows that the given element appears just once within the
tree or is satisfied with the first branch to reference the given element.
'''
result = []
xet = xml.etree.ElementTree
if not xet.iselement(root) or not xet.iselement(element):
    return result
xpath = './/' + element.tag 
    + ''.join(["[@%s='%s']" % a for a in element.items()])
parent = root
while parent != None:
    result.append(parent)
    for child in parent.findall('*'):
        if child == element:
            result.append(element)
            return result
        if child.findall(xpath).count(element):
            parent = child
            break
    else:
        return []
return result

链接地址: http://www.djcxy.com/p/29951.html

上一篇: python xml query get parent

下一篇: compose" does not appear to allow or build from local images