Reading Large excel file using SAX Approach.

Category: open xml format sdk

Question

meKrishna on Fri, 21 Oct 2011 14:04:44


I  have large excel file (approx 500,000 rows) and i have to validate each column of the excel file. I want to use SAX approach to read excel file. The problem i am facing is , my excel file contains int, float, char, string value and i don't know how to read different data Types using SAX approach. 

Any suggestion is appreciated.

Replies

Bruce Song on Mon, 24 Oct 2011 03:54:15


Hi Krishna,

Thank you for posting. We are doing the research about the problem, there might be some delay about the response. Appreciate your patience.

Best Regards,

Will Buffington [MSFT] on Mon, 24 Oct 2011 21:47:54


Here is a C# code sample using the SAX approach for reach an Excel file:

WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();

OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);
string text;
while (reader.Read())
{
    if (reader.ElementType == typeof(CellValue))
    {
        text = reader.GetText();
        Console.Write(text + " ");
    }
}

And here is a VB approach for doing the same:

Dim workbookPart As WorkbookPart = spreadsheetDocument.WorkbookPart
Dim worksheetPart As WorksheetPart = workbookPart.WorksheetParts.First()

Dim reader As OpenXmlReader = OpenXmlReader.Create(worksheetPart)
Dim text As String
While reader.Read()
    If reader.ElementType = GetType(CellValue) Then
        text = reader.GetText()
        Console.Write(text & " ")
    End If
End While

jimsurf on Thu, 14 Dec 2017 23:27:43


GetText doesn't return the value of strings it returns an index to the shared string table.

This is what I am doing, but I am not sure about the performance of it:

var c = (Cell)reader.LoadCurrentElement();
                        var cellValue = GetCellValue(c, workbookPart);

        private string GetCellValue(Cell c, WorkbookPart workbookPart)
        {
            if (c.DataType != null && c.DataType == CellValues.SharedString)
            {
                return workbookPart.SharedStringTablePart.SharedStringTable.Elements<SharedStringItem>()
                    .ElementAt(int.Parse(c.CellValue.InnerText)).Text.Text;

            }

            return c.CellValue != null ? c.CellValue.InnerText : string.Empty;
        }