Question

Vincent Rich on Mon, 17 Sep 2012 11:22:33


Hi,

I have a xml document that is shown below :

<?xml version="1.0" encoding="UTF-8"?> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops"> <head> <meta charset="utf-8" /> </head> <body>

<nav epub:type="foobar">
</nav>

<nav epub:type="toc"> <ol> <li> <a href="heftywater.xhtml#title">Hefty Water</a> <ol> <li> <a href="heftywater.xhtml#switch">The Switch</a> </li> <li> <a href="heftywater.xhtml#source">The Source</a> </li> <li> <a href="heftywater.xhtml#ruby">Hefty Ruby Water</a> </li> </ol> </li> </ol> </nav> </body> </html>

The problem I have is with namespaces and I am starting first to extract the generale namespace like this:

string navContent = FileSystem.ReadContent(item.HRef);
            XDocument xDoc = XDocument.Parse(navContent);

            // Extract document namespace because ne need it to do our queries with Linq
            var ns = xDoc.Document.Root.AttributeOrEmpty("xmlns");
            if (ns.StartsWith("http:")) { ns = string.Format("{{{0}}}", ns); }

            var nav = xDoc.Root.Element(ns + "nav").Descendants();
...

but it doesn't work maybe because I am not using the right epub namespace.

And I wondering what is the best way of getting the nav epub:type="toc" node ?

What I would like to do is to populate an abstract data structure that represents table of content with something like :

public class EpubTocItem
    {
        public string ID { get; set; }

        public List<string> Label { get; set; }

        public string Href { get; set; }

        readonly List<EpubTocItem> _children = new List<EpubTocItem>();
        public IList<EpubTocItem> Children
        {
            get { return _children; }
        } 
    }


Sponsored



Replies

Joon84 on Mon, 17 Sep 2012 11:57:55


well, using XmlReader i can read the xml(i stored the xml to C drive as a.xml and then read)

using (XmlReader reader = XmlReader.Create(@"C:\a.xml"))
{
    while (reader.Read())
    {
        switch (reader.NodeType)
        {
            case XmlNodeType.Element:
                if (reader.Name == "nav")
                {
                    //string strAttribute = reader.GetAttribute("epub:type");
                    string strAttribute = reader.GetAttribute(0);
                    string strValue = reader.ReadString();
                }
                break;
        }
    }
}

regards

joon

Vincent Rich on Mon, 17 Sep 2012 12:51:34


Ok but what if I want something a bit faster/smarter/different ?
I would be curious to have the answer in Linq if it applies or maybe use XPath in that case.

Wyck on Mon, 17 Sep 2012 13:55:25


I think that your problem is that you have an default (anonymous) namespace and while this is perfectly legal XML, it's XPath that doesn't support the anonymous namespace.

Basically this declaration below defines two namespaces, one default/anonymous ("http://www.w3.org/1999/xhtml") and one explicitly named (epub="http://www.idpf.org/2007/ops"):

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">

To do an xpath query on it, you'll have to assign a name to the default namespace and use the name explicitly in your query.

So for example we can assign the name "foo" to the namespace "http://www.w3.org/1999/xhtml".  Then we can query that node using the following xpath where everything that is in the default namespace is now explicitly qualified with "foo".

//foo:nav[@epub:type="toc"]

(There are other XPath queries that would work.  The "all descendants" syntax, i.e.: double slash //,  is just what I chose for simplicity.)

Here's code that I tested:

string text = File.ReadAllText( @"XMLFile1.xml" );
XDocument doc = XDocument.Parse( text );
XmlNamespaceManager namespaceManager = new XmlNamespaceManager( new NameTable() );
namespaceManager.AddNamespace( "foo", @"http://www.w3.org/1999/xhtml" );
namespaceManager.AddNamespace( "epub", @"http://www.idpf.org/2007/ops" );

// Literally:  //foo:nav[@epub:type="toc"]
string xpath = @"//foo:nav[@epub:type=""toc""]";
var tocNode = doc.XPathSelectElement( xpath, namespaceManager );

NOTE: One particular gotcha is that you have to add:  using System.Xml.XPath; to get access to XPathSelectElement.

See also: How to use XPath with XDocument for more specific details on these techniques, and, in particular, how to make use of LINQ.


Vincent Rich on Mon, 17 Sep 2012 21:30:40


Waow thanks for such a complete and well documented answer.

Louis.fr on Tue, 18 Sep 2012 13:29:45


Since you weren't using XPath, there is no need for adding any namespace manager. Using the {namespace} syntax is sufficient.

Your problem is that you're searching for a 'nav' element under the root, while 'nav' is under 'body' which is under the root.

Either use multiple Element calls, or Descendants:

var nav = xDoc.Root.Element(ns + "body").Elements(ns + "nav").Descendants();

or

var nav = xDoc.Root.Descendants(ns + "nav").Descendants();

Vincent Rich on Thu, 20 Sep 2012 10:04:10


Thanks now I start to have all the possibilities to select a node.

I just have now to find the easiest way of transforming those xml nodes into a nested structure :

<ol xmlns="http://www.w3.org/1999/xhtml">
  <li>
    <a href="heftywater.xhtml#title">Hefty Water</a>
    <ol>
      <li>
        <a href="heftywater.xhtml#switch">The Switch</a>
      </li>
      <li>
        <a href="heftywater.xhtml#source">The Source</a>
      </li>
      <li>
        <a href="heftywater.xhtml#ruby">Hefty Ruby Water</a>
      </li>
    </ol>
  </li>
</ol>

and my nested structure :

public class EpubToc
    {
        public List<EpubTocItem> Items { get; set; }

        public EpubToc()
        {
            Items = new List<EpubTocItem>();
        }
    }

    public class EpubTocItem
    {
        public List<string> Text { get; set; }

        public string HRef { get; set; }

        readonly List<EpubTocItem> _children = new List<EpubTocItem>();
        public IList<EpubTocItem> Children
        {
            get { return _children; }
        } 
    }

XElement tocNode = xDoc.XPathSelectElement(xpath, namespaceManager);
Toc = new EpubToc();
BuildTocFromNav(Toc.Items, tocNode, 0);


void BuildTocFromNav(IList<EpubTocItem> tocItem, XElement element, int depth)
{
    // For simplicity, argument validation not performed

    if (!element.HasElements)
    {
    }
    else
    {
        depth++;
        foreach (XElement child in element.Elements())
        {
            BuildTocFromNav(tocItem, child, depth);
        }
        depth--;
    }
}

But with your answer I would start with something like :

var nav = xDoc.Root.Descendants(nsDefault.FormatNameSpace() + "nav").Descendants();
    Toc = new EpubToc();
    BuildTocFromNav(Toc.Items, nav, 0)

    void BuildTocFromNav(IList<EpubTocItem> tocItem, IEnumerable<XElement> element, int depth)
    {

    }

What is the right way to follow ?