Quantcast
Channel: SCN : Popular Discussions - Java Development
Viewing all articles
Browse latest Browse all 518

How to parse html to well-formed and to pdf

$
0
0

Dear friends,   I have reqruiment to convert xml to pdf, to do that i tried many ways, flying sourcer, itext, apache fop and etc. but the generated xml and xslt  is coming with errors, therefore i coulnt convert it to the pdf.  To do that, i first convert xml and xsl to html and now try to convert the genereted html to pdf.  Generated html is not well formed, and the i coulnt convert the html to pdf. I found a solution called tiddy, using fallowing page    http://infohound.net/tidy/tidy.pl  when i convert my html to well formed html then it works fine, but when i convert my html to well-formed html with the jar that tiddy served, it is not work and i get the fallowing error,

 

Exception in thread "main" com.itextpdf.tool.xml.exceptions.RuntimeWorkerException: Invalid nested tag head found, expected closing tag link.     at com.itextpdf.tool.xml.XMLWorker.endElement(XMLWorker.java:134)     at com.itextpdf.tool.xml.parser.XMLParser.endElement(XMLParser.java:395)     at com.itextpdf.tool.xml.parser.state.ClosingTagState.process(ClosingTagState.java:70)     at com.itextpdf.tool.xml.parser.XMLParser.parseWithReader(XMLParser.java:235)     at com.itextpdf.tool.xml.parser.XMLParser.parse(XMLParser.java:213)     at com.itextpdf.tool.xml.parser.XMLParser.parse(XMLParser.java:174)     at com.itextpdf.tool.xml.XMLWorkerHelper.parseXHtml(XMLWorkerHelper.java:223)     at com.itextpdf.tool.xml.XMLWorkerHelper.parseXHtml(XMLWorkerHelper.java:185)     at html_to_pdf_xml_worker.html_to_pdf.main(html_to_pdf.java:22)

 

My code that converting the html is like,

 

 

import org.w3c.tidy.Tidy;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import org.w3c.dom.Document;
import org.w3c.tidy.Tidy;
public class set_html {              public static void main(String[] args){                try{                   FileInputStream FIS=new FileInputStream("sample.html");                   FileOutputStream FOS=new FileOutputStream("my_xml.xhtml");                      Tidy T=new Tidy();                   Document D=T.parseDOM(FIS,FOS);                   }                catch (java.io.FileNotFoundException e)                   {System.out.println(e.getMessage());}                   }             }

 

and my html to pdf converter code is,

 

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.tool.xml.XMLWorkerHelper;
public class html_to_pdf {             public static void main( String[] args ) throws DocumentException, IOException              {                // step 1                  Document document = new Document();                  // step 2                  PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream("pdf.pdf"));                  // step 3                  document.open();                  // step 4                  XMLWorkerHelper.getInstance().parseXHtml(writer, document,                          new FileInputStream("my_xml.xhtml"));                   //step 5                   document.close();                             System.out.println( "PDF Created!" );              }
}

Viewing all articles
Browse latest Browse all 518

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>