Convert Word to HTML with Track Changes

Category: open xml format sdk

Question

Tech Aspirant on Sat, 07 Jan 2017 18:26:36


Hello,

I want to convert my word document into HTM Tags with track changes highlighted. I am able to convert word document to html but text present in word document having track changes are resulting as simple plain text in HTML I want to show them in background colour as yellow in HTML please help

            byte[] byteArray = File.ReadAllBytes(@"C:\Users\admin\Desktop\New Microsoft Word Document.docx");

            using (MemoryStream memoryStream = new MemoryStream())
            {
                memoryStream.Write(byteArray, 0, byteArray.Length);
                using (WordprocessingDocument doc = WordprocessingDocument.Open(memoryStream, true))
                {
                    int imageCounter = 0;
                    HtmlConverterSettings settings = new HtmlConverterSettings()
                    {
                        PageTitle = "My Page Title",
                        ImageHandler = imageInfo =>
                        {
                            DirectoryInfo localDirInfo = new DirectoryInfo(@"C:\Users\admin\Desktop\img");
                            if (!localDirInfo.Exists)
                            {
                                localDirInfo.Create();
                            }
                            ++imageCounter;
                            string extension = imageInfo.ContentType.Split('/')[1].ToLower();
                            ImageFormat imageFormat = null;
                            if (extension == "png")
                            {
                                extension = "gif";
                                imageFormat = ImageFormat.Gif;
                            }
                            else if (extension == "gif")
                                imageFormat = ImageFormat.Gif;
                            else if (extension == "bmp")
                                imageFormat = ImageFormat.Bmp;
                            else if (extension == "jpeg")
                                imageFormat = ImageFormat.Jpeg;
                            else if (extension == "tiff")
                            {
                                extension = "gif";
                                imageFormat = ImageFormat.Gif;
                            }
                            else if (extension == "x-wmf")
                            {
                                extension = "wmf";
                                imageFormat = ImageFormat.Wmf;
                            }
                            if (imageFormat == null)
                                return null;

                            string imageFileName = @"C:/Users/admin/Desktop/img/ image" +
                                imageCounter.ToString() + "." + extension;
                            try
                            {
                                imageInfo.Bitmap.Save(imageFileName, imageFormat);
                            }
                            catch (System.Runtime.InteropServices.ExternalException)
                            {
                                return null;
                            }
                            XElement img = new XElement(Xhtml.img,
                                new XAttribute(NoNamespace.src, imageFileName),
                                imageInfo.ImgStyleAttribute,
                                imageInfo.AltText != null ?
                                    new XAttribute(NoNamespace.alt, imageInfo.AltText) : null);
                            return img;
                        }
                    };
                    XElement html = HtmlConverter.ConvertToHtml(doc, settings);
                    File.WriteAllText(@"C:\Users\admin\Desktop\New Microsoft Word Document.html", html.ToStringNewLineOnAttributes());
                };
            }

Replies

Kristin Xie on Mon, 09 Jan 2017 06:58:22


Hi Tech,

Based on your code, this case more related to OpenXml issue, I will help move your case to that forum for better support.

Best regards,

Kristin

Edward8520 on Tue, 10 Jan 2017 06:29:33


Hi Tech,

How did you write track changes? Did your track changes show up in HTML? If it did, I think you could format the html tags by css.

Here is a simple code:

            byte[] byteArray = File.ReadAllBytes(@"D:\OfficeDev\Word\201701\Test.docx");
            string css = @"
        p.PtNormal
            {margin-bottom:10.0pt;
            font-size:11.0pt;
            font-family:""Times"";}
        span.PtDefaultParagraphFont
            {margin-top:24.0pt;
            font-size:14.0pt;
            font-family:""Helvetica"";
            color:yellow;}
        h1.PtHeading1
            {margin-top:24.0pt;
            font-size:14.0pt;
            font-family:""Helvetica"";
            color:blue;}
        h2.PtHeading2
            {margin-top:10.0pt;
            font-size:13.0pt;
            font-family:""Helvetica"";
            color:blue;}";
            using (MemoryStream memoryStream = new MemoryStream())
            {
                memoryStream.Write(byteArray, 0, byteArray.Length);
                using (WordprocessingDocument doc = WordprocessingDocument.Open(memoryStream, true))
                {
                    int imageCounter = 0;
                    HtmlConverterSettings settings = new HtmlConverterSettings()
                    {
                        PageTitle = "My Page Title",

                        CssClassPrefix = "Pt",
                        AdditionalCss= css,

You could refer the link below for more information.
# Transforming Open XML WordprocessingML to XHTML Using the Open XML SDK 2.0
https://msdn.microsoft.com/en-us/library/office/ff628051(v=office.14).aspx

Best Regards,

Edward

                       

Tech Aspirant on Wed, 11 Jan 2017 03:56:09


Hello,

I am able to detect whether document contains track Revisions or not but I can't find which line is having track changes in my document below is the code to detect track changes

 public static System.Type[] trackedRevisionsElements = new System.Type[] {
        typeof(CellDeletion),
        typeof(CellInsertion),
        typeof(CellMerge),
        typeof(CustomXmlDelRangeEnd),
        typeof(CustomXmlDelRangeStart),
        typeof(CustomXmlInsRangeEnd),
        typeof(CustomXmlInsRangeStart),
        typeof(Deleted),
        typeof(DeletedFieldCode),
        typeof(DeletedMathControl),
        typeof(DeletedRun),
        typeof(DeletedText),
        typeof(Inserted),
        typeof(InsertedMathControl),
        typeof(InsertedMathControl),
        typeof(InsertedRun),
        typeof(MoveFrom),
        typeof(MoveFromRangeEnd),
        typeof(MoveFromRangeStart),
        typeof(MoveTo),
        typeof(MoveToRangeEnd),
        typeof(MoveToRangeStart),
        typeof(MoveToRun),
        typeof(NumberingChange),
        typeof(ParagraphMarkRunPropertiesChange),
        typeof(ParagraphPropertiesChange),
        typeof(RunPropertiesChange),
        typeof(SectionPropertiesChange),
        typeof(TableCellPropertiesChange),
        typeof(TableGridChange),
        typeof(TablePropertiesChange),
        typeof(TablePropertyExceptionsChange),
        typeof(TableRowPropertiesChange),
    };

        public static bool PartHasTrackedRevisions(OpenXmlPart part)
        {
            List<OpenXmlElement> insertions =
             part.RootElement.Descendants<Inserted>()
            .Cast<OpenXmlElement>().ToList();

            using (WordprocessingDocument wdDoc =
        WordprocessingDocument.Open(@"C:\Users\admin\Desktop\Lorem Ipsum.docx", true))
            {
                Body body = wdDoc.MainDocumentPart.Document.Body;

                // Handle the formatting changes.
                List<OpenXmlElement> changes =
                    body.Descendants<Inserted>()
                    .Cast<OpenXmlElement>().ToList();

                string dummy = string.Empty;

            }
            return part.RootElement.Descendants()
                .Any(e => trackedRevisionsElements.Contains(e.GetType()));
        }

        public static bool HasTrackedRevisions(WordprocessingDocument doc)
        {
            if (PartHasTrackedRevisions(doc.MainDocumentPart))
                return true;
            foreach (var part in doc.MainDocumentPart.HeaderParts)
                if (PartHasTrackedRevisions(part))
                    return true;
            foreach (var part in doc.MainDocumentPart.FooterParts)
                if (PartHasTrackedRevisions(part))
                    return true;
            if (doc.MainDocumentPart.EndnotesPart != null)
                if (PartHasTrackedRevisions(doc.MainDocumentPart.EndnotesPart))
                    return true;
            if (doc.MainDocumentPart.FootnotesPart != null)
                if (PartHasTrackedRevisions(doc.MainDocumentPart.FootnotesPart))
                    return true;
            return false;
        }

        private void button2_Click(object sender, EventArgs e)
        {
            foreach (var documentName in Directory.GetFiles(".", "*.docx"))
            {
                using (WordprocessingDocument wordDoc =
                    WordprocessingDocument.Open(documentName, false))
                {
                    if (HasTrackedRevisions(wordDoc))
                        Console.WriteLine("{0} contains tracked revisions", documentName);
                    else
                        Console.WriteLine("{0} does not contain tracked revisions", documentName);
                }
            }

        }

Edward8520 on Wed, 11 Jan 2017 09:18:48


Hi Tech,

Could you share us generated html file? Did track changes appear in generated html file? Could you find the class for track changes in html file? Could you set css style of track changes?

Best Regards,

Edward