How to use the JavaScript library pdf.js in Selenium with Java through the JavaScriptExecutor class

Question

How to use the JavaScript library pdf.js in Selenium with Java through the JavaScriptExecutor class

Navigation

#1 by (0 votes)

0

I found this library that does exactly what I need, extract the text from the PDF and transform it into a String. link link

From what I researched (a lot), it seems to me that the version below is the most recent of pdf.js. However, I can not open the pdf file in the browser, cause this library to be called, and then use its methods to copy the text. link

I searched a lot for 2 in a row, in fact I'm not a big connoisseur of js, but I found this way link that seems to be the ideal of how to implement, however, I could not adapt to the Selenium JavascriptExecutor.

Here's my attempt trying to call the index of the first example link .

driver.get("file:///C:/Users/user/Desktop/arquivo.pdf");

    JavascriptExecutor jse = (JavascriptExecutor) driver;

    String script1 = "id=\"pdf-js\"";
    String script2 = "src=\"projeto/src/test/resources/js/pdf.js\"";
    String script3 = "PDFJS.workerSrc = cslight/src/test/resources/js/pdf.js";
    String script4 = "src=\"/projeto/src/test/resources/js/app.js\"";
    String script5 = "var app = new App;";

    jse.executeScript(script1);
    jse.executeScript(script2);
    jse.executeScript(script3);
    jse.executeScript(script4);
    jse.executeScript(script5);

Below the error:

Exception in thread "main" org.openqa.selenium.WebDriverException: unknown error: PDFJS is not defined

(Session info: chrome = 65.0.3325.181) (Driver info: chromedriver = 2.37.544315 (730aa6a5fdba159ac9f4c1e8cbc59bf1b5ce12b7), platform = Windows NT 10.0.14393 x86_64) (WARNING: The server did not provide any stacktrace information) Command duration or timeout: 0 milliseconds Build info: version: '3.5.3', revision: 'a88d25fe6b', time: '2017-08-29T12: 42: 44.417Z' System info: host: 'NC0048', ip: '10 .13.30.196 ', os.name:' Windows 10 ', os.arch:' amd64 ', os.version: '10 .0', java.version: '1.8.0_161 ' Driver info: org.openqa.selenium.chrome.ChromeDriver Capabilities [{mobileEmulationEnabled = false, hasTouchScreen = false, platform = XP, acceptSslCerts = false, acceptInsecureCerts = false, webStorageEnabled = true, browserName = chrome, takesScreenshot = true, javascriptEnabled = true, platformName = XP, setWindowRect = true, unexpectedAlertBehaviour = applicationCacheEnabled = false, rotatable = false, networkConnectionEnabled = false, chrome = {chromedriverVersion = 2.37.544315 (730aa6a5fdba159ac9f4c1e8cbc59bf1b5ce12b7), userDataDir = C: \ Users \ ICARO ~ 1.PRA \ AppData \ Local \ Temp \ scoped_dir17892_11337}, takesHeapSnapshot = true, pageLoadStrategy = normal, unhandledPromptBehavior =, databaseEnabled = false, handlesAlerts = true, version = 65.0.3325.181, browserConnectionEnabled = false, nativeEvents = true, locationContextEnabled = true, cssSelectorsEnabled = true}] Session ID: 757fa21a22500f6618317bc12d5799ce at sun.reflect.NativeConstructorAccessorImpl.newInstance0 (Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance (NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance (DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance (Constructor.java:423) at org.openqa.selenium.remote.ErrorHandler.createThrowable (ErrorHandler.java:215) at org.openqa.selenium.remote.ErrorHandler.throwIfResponseFailed (ErrorHandler.java:167) at org.openqa.selenium.remote.http.JsonHttpResponseCodec.reconstructValue (JsonHttpResponseCodec.java:40) at org.openqa.selenium.remote.http.AbstractHttpResponseCodec.decode (AbstractHttpResponseCodec.java:82) at org.openqa.selenium.remote.http.AbstractHttpResponseCodec.decode (AbstractHttpResponseCodec.java:45) at org.openqa.selenium.remote.HttpCommandExecutor.execute (HttpCommandExecutor.java:164) at org.openqa.selenium.remote.service.DriverCommandExecutor.execute (DriverCommandExecutor.java:82) at org.openqa.selenium.remote.RemoteWebDriver.execute (RemoteWebDriver.java:646) at org.openqa.selenium.remote.RemoteWebDriver.executeScript (RemoteWebDriver.java:582) at br.com.conductor.test.GenericTester.tester (GenericTester.java:40) at br.com.conductor.test.GenericTester.main (GenericTester.java:61)

javascript java selenium jspdf

asked by anonymous 12.04.2018 / 16:28

1 answer

How to handle error code from an http request Select from 2 Java tables

score 0 · Accepted Answer

Here are two APIs you can add in your maven project to read PDF:

com.itextpdf itextpdf 5.5.13 org.apache.pdfbox pdfbox 2.0.9

link

package testcases;

import java.io.File; import java.io.IOException;

import org.apache.pdfbox.io.RandomAccessBufferedFileInputStream; import org.apache.pdfbox.io.RandomAccessRead; import org.apache.pdfbox.pdfparser.PDFParser; import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.text.PDFTextStripper; import org.junit.Test;

import com.itextpdf.text.pdf.PdfReader; import com.itextpdf.text.pdf.parser.PdfTextExtractor;

public class PdfTest {

private final String pdfUrl = "http://files.isec.pt/DOCUMENTOS/SERVICOS/BIBLIO/teses/Tese_Mest_Marcio-Carvalho.pdf";
private final String pdfPath = "/home/diamaral/Documentos/diamaral/test.pdf";

@Test
public void lerConteudoPdfUsandoApiIText() throws IOException {
    PdfReader pdfReader = new PdfReader(pdfUrl); 

    System.out.println("\n\n---------API ITEXT-----------------------------"+
            PdfTextExtractor.getTextFromPage(pdfReader,1));
}

@Test
public void lerPdfUsandoApiPdfBox() throws IOException {
    RandomAccessRead doc = new RandomAccessBufferedFileInputStream(new File(pdfPath));
    PDFParser parser = new PDFParser(doc);
    parser.parse();
    PDDocument pdfDoc = parser.getPDDocument();
    PDFTextStripper stripper = new PDFTextStripper();
    System.out.println("\n\n---------API PDFBOX-----------------------------"
                        +stripper.getText(pdfDoc));
    pdfDoc.close();
}

}