Question

Richard E. Martino on Fri, 29 Dec 2017 13:48:30


I have several one-page PDFs of scanned pictures, and I no longer have the original pictures.

I assume, or hope, that the internal PDF file format is simply some image format such as BMP, JPG, GIF, TIFF, PNG or any other common image format.  If not and you know of an algorithm to convert, I would appreciate that knowledge, please.

How can I write a C# program to open the PDF, even as a byte array, and extract the image itself?


--Richard Martino


Sponsored



Replies

CoolDadTx on Fri, 29 Dec 2017 14:39:49


PDFs are not just images. The spec is available from Adobe. There are third party libraries that can take a PDF and convert it to a TIFF where you can then treat each page as an image.

iTextSharp is supposed to be able to extract images from within a PDF. I've never tried it but here's a link to a post about how someone else did.

Fei Hu on Mon, 01 Jan 2018 10:29:17


Hello Richard,

>>If not and you know of an algorithm to convert, I would appreciate that knowledge, please.

I am not proficient in conversion algorithms and you could use PDF converter SDK or third party libs to achieve it. There is a simple example that using PDF SDK for .Net

// Load a PDF file.
String inputFilePath = Program.RootPath + "\\" + "1.pdf";
PDFDocument doc = new PDFDocument(inputFilePath);

// Get the first page of PDF file.
PDFPage page = (PDFPage)doc.GetPage(0);

// Convert the first PDF page to a JPEG file.
page.ConvertToImage(ImageType.JPEG, Program.RootPath + "\\Output.jpg");

And also you could use other options as below link.

https://stackoverflow.com/questions/23905169/how-to-convert-pdf-files-to-image

Best regards,

Neil Hu


MSDN Community Support
Please remember to click "Mark as Answer" the responses that resolved your issue, and to click "Unmark as Answer" if not. This can be beneficial to other community members reading this thread. If you have any compliments or complaints to MSDN Support, feel free to contact MSDNFSF@microsoft.com.

Ezreal93 on Tue, 02 Jan 2018 02:11:59


Hi, you can use Spire.PDF to load your one-page PDFs from byte array, and save it as picture in popular image format.

PdfDocument doc = new PdfDocument();
doc.LoadFromBytes(byte[] bytes);
doc.SaveAsImage(int pageIndex, PdfImageType type);


 

Richard E. Martino on Wed, 03 Jan 2018 12:45:43


I thank all of you for your replies.

For my use, I especially thank Michael Taylor for stating, "The spec is available from Adobe."  I found it here:

https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf

I will start with section 7.5 File Structure, and work from there.

I am leery of third party packages without their source code because I do not want to execute unknown software on my computer.  I may reinvent the wheel, but I will know exactly what this wheel does ... plus I can improve it if I want to.

Thanks again for your help.