Call COM/ActiveX from Java
Call Java from C# or Visual Basic
Build a COM/ActiveX component for Java classes in 15 minutes
Build a Java API for any COM/ActiveX component in 15 minutes
 

Automating Internet Explorer from Java using EZ JCom

Internet Explorer can be automated using the Java API built by EZ JCom. There are two components involved
  • Java API for the Web Browser
  • Java API for the HTML Parser
The Web Browser API is sufficient for automating the navigation, and parsing the resultant HTML can be conveniently done using the HTML parser API.

These APIs can be built using EZ JCom, just like the Java API for any other COM application. But the HTML Parser API is extremely huge. Therefore a pre-built package for Internet Explorer is provided, including sample programs. Those interested in automating Internet Explorer from Java should download this pre-packaged version instead of using the tool directly.

Sample Programs

Following are some brief Internet Explorer sample programs (also included in the pre-packaged version.) Following the samples, is a guideline explaining the programming steps.
SimpleController.java   Command line controller for IE. Starts an instance of IE, and invokes API methods on it in response to command line input.
StockQuote.java   Uses IE to obtain stock quotes from several different web-sites. Shows
  • how to wait for a web-site to fully load, and
  • ad-hoc parsing of web-sites to retrieve a data item of interest.
Spider.java   Given a URL and a depth (an integer value), spiders all the links on the web-site recursively to the specified depth.
PostData.java   Shows how to POST data using the IE Java API.
JavaScriptSample.java   Shows how to call JavaScript functions in a loaded web page.
ActiveXSample.java   Shows how to access ActiveX controls in a loaded web page.
WebFrame.java   Shows how to embed Internet Explorer visually in a Swing JPanel.
TreeWalk.java   Walks the tree structure of an HTML document using MSHTML, and displays the structure in a JTree.

In addition, the files WebLoadListener.java and WebEventsAdapter.java provide supporting classes for use by these samples.

Image of an instance of Internet Explorer embedded within a Swing UI:

General Hints for programming with the Internet Explorer Java API

  • The APIs have been modified by Microsoft a number of times, and multiple versions of interfaces with names like IHTMLElement, IHTMLElement2, IHTMLElement3 etc are available, where the "2", "3" etc refer to revisions of earlier APIs. The interface returned is typically a generic JComObject and it must be frequently coerced into the desired interface.

    EZ JCom's method JComObject.JComCoerceObjectToAnotherType can be used for this. A typical code fragment is

    IHTMLDocument2 doc = (IHTMLDocument2)
      app.getDocument().JComCoerceObjectToAnotherType( IHTMLDocument2.class );
    where the value returned by getDocument() must be coerced into IHTMLDocument2. (If the returned object doesn't support IHTMLDocument2, the coercion will fail.)
  • Often, the same object can be coerced into any one of multiple related interfaces. E.g. the value returned by app.getDocument() may be coerced to either IHTMLDocument, IHTMLDocument2, IHTMLDocument3, IHTMLDocument4, or IHTMLDocument5. The appropriate interface can be chosen based upon the method of interest.
  • Parsing web pages for data items within them can be difficult and require ad-hoc solutions. The source for the web pages should be scanned and patterns should be identified. Some exmaples of such patterns are provided in the StockQuote sample. Note that such solutions are fragile, if the website layout changes, a new ad-hoc solution may be required.
  • The application can be made visible, or can be left invisible. It is recommended that during development, the application should be left visible.
  • Internet Explorer can be used as an engine to retrieve and parse contents of web sites. It can be left invisible for such usage. However, if it is left invisible, a workaround must be added for an Internet Explorer bug. The bug is documented at http://support.microsoft.com/kb/259935, and the workaround simply involves calling the API method "setLeft" with the value of "- getWidth()". The samples "StockQuote.java" and "Spider.java" show this workaround. Without this workaround, the "Document Complete" event may not fire in some cases, therefore the Java program cannot reliably wait for the web load to complete successfully.
  • Before exiting the program, the method Quit() should be called to exit the instance of Internet Explorer. Some nagivations or API methods may also cause other threads to start running, in which case System.exit may need to be called to force the Java application to exit.