Condividi tramite


Using SelectSingleNode (or SelectNodes) on XML where the default namespace has been set

I've been stumped by this one at least two times over the last couple of years, so I thought it was a good candidate to be written up here.

I was trying to select a node from some standard XHTML where the default namespace was set. In otherwords the XHTML was something like:

   <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "https://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"[]>
<html xmlns="https://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>MSN Search News: Microsoft</title> ...

Note the xmlns attribute on the root <html> node.

Without thinking too hard, I first tried to find the title of the page by going ...

   XmlDocument resultsXhtml = new XmlDocument();
resultsXhtml.Load("https://search.msn.com/news/results.aspx?q=Microsoft");
XmlNode metaNode = resultsXhtml.SelectSingleNode("//title");

... which left metaNode as null.

This took me a little while to figure out. Clearly I need to identify in the XPath query that the title tag is in the default namespace, but how can I do that if that namespace has no prefix in the actual XML.

The solution (reasonably obviously!) is to register a prefix of my own choosing in an XmlNamespaceManager object, and then use that namespace manager when doing the select. Here's some code that works:

   XmlDocument resultsXhtml = new XmlDocument();

   resultsXhtml.Load("https://search.msn.com/news/results.aspx?q=Microsoft");

   XmlNamespaceManager namespaceManager = new XmlNamespaceManager(resultsXhtml.NameTable);

   namespaceManager.AddNamespace("myprefix", "https://www.w3.org/1999/xhtml");

   XmlNode metaNode = resultsXhtml.SelectSingleNode("//myprefix:title", namespaceManager);

 

I think what's interesting about this problem, is the way you have to think about namespaces and XPath queries. The namespace is a logical entity denoted by the URI not the prefix in the actual XML. Therefore you can register that URI with any prefix you want in your XPath, which isn't a completely intuitive concept - to me at least!

Comments

  • Anonymous
    November 21, 2005
    Actually, the whole idea that it is the URI that is the logical entity, and NOT the prefix is something that took a while for me to "get" also. It was only when I was working with a lot of files that had an un-prefixed namespace that I finally figured it out!

  • Anonymous
    January 25, 2006
    Thanks -- I was struggling for AGES with this. The documentation is as clear as mud...

  • Anonymous
    July 28, 2009
    Thanks, this helped me out.  Was wondering why my xpath was not working till i stumbled on this post.

  • Anonymous
    August 12, 2009
    The comment has been removed

  • Anonymous
    December 18, 2009
    While I understand that the prefix for the xpath query is controlled by the XmlNamespaceManager and can be different than the prefixes used in the xml itself, it disturbs me that one can set the "default namespace" for the XmlNamespaceManager and that default is ignored in the xpath query.  This to me is a bug in the implementation, which should be corrected to avoid untold hours of frustration by developers attempting to discover this workaround. Thanks for your post, it did indeed shorten the amount of time that I was frustrated.

  • Anonymous
    January 12, 2010
    The comment has been removed

  • Anonymous
    August 25, 2010
    Thanks a lot!

  • Anonymous
    April 25, 2013
    Great post. This had me stumped.

  • Anonymous
    October 29, 2013
    Great explanation. Thanks!

  • Anonymous
    December 19, 2013
    What about if the XML doesn't have any namespace to refer to?

  • Anonymous
    December 19, 2013
    Elsa, I'm not sure I understand your question. If there is no namespace set, doesn't the node query just work without any prefix? So in my example:    XmlNode metaNode = resultsXhtml.SelectSingleNode("//title"); then it would actually return the title node if the XML had no namespaces set John

  • Anonymous
    March 05, 2014
    Here the namespace is hardcoded. Can we process the multiple XML Docs having the same child node title and have different namespaces [Namespace will be decided at Runtime]? For Example: XML Document 1 <Root xmlns=http://abc.com/example> </Root> XML Document 2 <Root xmlns=http://xyz.com/example2> <title>  Title 2 </title> </Root>

  • Anonymous
    March 05, 2014
    Manish - been a long time since I wrote this article (and don't do much XML programming nowadays!), but I think the answer is that because the namespace of the two documents are different, then the documents aren't actually the same. Just because they look similar with the same structure, the different namespace means they are actually different.