Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: apache/tika
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: 1.28.4
Choose a base ref
...
head repository: apache/tika
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: 1.28.5
Choose a head ref

Commits on Jun 13, 2022

  1. Verified

    This commit was signed with the committer’s verified signature.
    tballison Tim Allison
    Copy the full SHA
    78aed18 View commit details

Commits on Jun 17, 2022

  1. Copy the full SHA
    0750891 View commit details
  2. TIKA-3793: update h2

    THausherr committed Jun 17, 2022
    Copy the full SHA
    67205b9 View commit details
  3. Copy the full SHA
    0344d17 View commit details
  4. Copy the full SHA
    aeebd82 View commit details

Commits on Jun 19, 2022

  1. TIKA-3793: update lucene

    THausherr committed Jun 19, 2022
    Copy the full SHA
    9caf03d View commit details

Commits on Jul 2, 2022

  1. TIKA-3793: update cxf

    THausherr committed Jul 2, 2022
    Copy the full SHA
    9536fff View commit details

Commits on Jul 7, 2022

  1. Copy the full SHA
    9b41d6a View commit details
  2. TIKA-3793: update rat

    THausherr committed Jul 7, 2022
    Copy the full SHA
    95c9f26 View commit details
  3. Copy the full SHA
    9b6f886 View commit details

Commits on Jul 8, 2022

  1. Copy the full SHA
    cc52436 View commit details

Commits on Jul 11, 2022

  1. Copy the full SHA
    3dcdfff View commit details

Commits on Jul 12, 2022

  1. Copy the full SHA
    0a6da81 View commit details

Commits on Jul 16, 2022

  1. TIKA-3793: update spring

    THausherr committed Jul 16, 2022
    Copy the full SHA
    7351374 View commit details
  2. Copy the full SHA
    5a7ebd0 View commit details

Commits on Jul 22, 2022

  1. TIKA-3793: update protobuf

    THausherr committed Jul 22, 2022
    Copy the full SHA
    fb1bd53 View commit details

Commits on Jul 27, 2022

  1. TIKA-3793: update protobuf

    THausherr committed Jul 27, 2022
    Copy the full SHA
    54faff2 View commit details

Commits on Jul 30, 2022

  1. TIKA-3793: update ddplist

    THausherr committed Jul 30, 2022
    Copy the full SHA
    0557791 View commit details

Commits on Aug 1, 2022

  1. Copy the full SHA
    3b88f83 View commit details
  2. TIKA-3793: update gson

    THausherr committed Aug 1, 2022
    Copy the full SHA
    a7638ea View commit details
  3. [TIKA-3825] ForkClient to check for thread interrupted status when wa…

    …iting for response. Add test to ForkParserTest to demonstrate issue and fix (#633)
    TheHound authored Aug 1, 2022

    Verified

    This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
    Copy the full SHA
    ae61117 View commit details
  4. Copy the full SHA
    3453da2 View commit details

Commits on Aug 5, 2022

  1. Verified

    This commit was signed with the committer’s verified signature.
    tballison Tim Allison
    Copy the full SHA
    b77e8ba View commit details

Commits on Aug 6, 2022

  1. TIKA-3793: update sqlite

    THausherr committed Aug 6, 2022
    Copy the full SHA
    c54f217 View commit details

Commits on Aug 9, 2022

  1. TIKA-3793: update junrar

    THausherr committed Aug 9, 2022
    Copy the full SHA
    1f2b695 View commit details

Commits on Aug 10, 2022

  1. TIKA-3793: add comment

    THausherr committed Aug 10, 2022
    Copy the full SHA
    ac05f4b View commit details
  2. Copy the full SHA
    c12681a View commit details
  3. TIKA-3793: update protobuf

    THausherr committed Aug 10, 2022
    Copy the full SHA
    63270c7 View commit details
  4. TIKA-3793: update objenesis

    THausherr committed Aug 10, 2022
    Copy the full SHA
    ffed947 View commit details

Commits on Aug 12, 2022

  1. Copy the full SHA
    d312298 View commit details

Commits on Aug 17, 2022

  1. TIKA-3793: update mockito

    THausherr committed Aug 17, 2022
    Copy the full SHA
    ebbdbf9 View commit details
  2. Copy the full SHA
    08d9b66 View commit details

Commits on Aug 19, 2022

  1. TIKA-3793: update joda-time

    THausherr committed Aug 19, 2022
    Copy the full SHA
    334d1dd View commit details

Commits on Aug 20, 2022

  1. Copy the full SHA
    48a4ff5 View commit details

Commits on Aug 25, 2022

  1. TIKA-3793: update sqlite

    THausherr committed Aug 25, 2022
    Copy the full SHA
    50f3272 View commit details

Commits on Aug 30, 2022

  1. TIKA-3793: update jackcess

    THausherr committed Aug 30, 2022
    Copy the full SHA
    b50c154 View commit details

Commits on Aug 31, 2022

  1. update developer list

    THausherr committed Aug 31, 2022
    Copy the full SHA
    73e849d View commit details

Commits on Sep 5, 2022

  1. TIKA-3793: update jackson

    THausherr committed Sep 5, 2022
    Copy the full SHA
    0248b80 View commit details

Commits on Sep 8, 2022

  1. Copy the full SHA
    72feaa6 View commit details
  2. Copy the full SHA
    db523d2 View commit details
  3. TIKA-3793: update opencsv

    THausherr committed Sep 8, 2022
    Copy the full SHA
    d26546b View commit details
  4. TIKA-3793: update mockito

    THausherr committed Sep 8, 2022
    Copy the full SHA
    65dd3d6 View commit details
  5. prep for 1.28.5-rc1

    tballison committed Sep 8, 2022

    Verified

    This commit was signed with the committer’s verified signature.
    tballison Tim Allison
    Copy the full SHA
    5d2ee3b View commit details
  6. Verified

    This commit was signed with the committer’s verified signature.
    tballison Tim Allison
    Copy the full SHA
    b694f15 View commit details
  7. Verified

    This commit was signed with the committer’s verified signature.
    tballison Tim Allison
    Copy the full SHA
    86a78ca View commit details
7 changes: 7 additions & 0 deletions CHANGES.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
Release 1.28.5 - 9/8/2022

* General dependency upgrades (TIKA-3793).

* Avoid infinite loop in bookmark extraction from PDFs (TIKA-3832).


Release 1.28.4 - 6/13/2022

* General dependency upgrades (TIKA-3780).
3 changes: 2 additions & 1 deletion pom.xml
Original file line number Diff line number Diff line change
@@ -25,7 +25,7 @@
<parent>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parent</artifactId>
<version>1.28.4</version>
<version>1.28.5</version>
<relativePath>tika-parent/pom.xml</relativePath>
</parent>

@@ -183,6 +183,7 @@ least three +1 Tika PMC votes are cast.
<excludes>
<exclude>CHANGES.txt</exclude>
<exclude>README.md</exclude>
<exclude>.gitattributes</exclude>
</excludes>
</configuration>
</plugin>
4 changes: 2 additions & 2 deletions tika-app/pom.xml
Original file line number Diff line number Diff line change
@@ -25,7 +25,7 @@
<parent>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parent</artifactId>
<version>1.28.4</version>
<version>1.28.5</version>
<relativePath>../tika-parent/pom.xml</relativePath>
</parent>

@@ -246,7 +246,7 @@
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>build-helper-maven-plugin</artifactId>
<version>3.0.0</version>
<version>3.3.0</version>
<executions>
<execution>
<phase>package</phase>
2 changes: 1 addition & 1 deletion tika-batch/pom.xml
Original file line number Diff line number Diff line change
@@ -25,7 +25,7 @@
<parent>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parent</artifactId>
<version>1.28.4</version>
<version>1.28.5</version>
<relativePath>../tika-parent/pom.xml</relativePath>
</parent>

2 changes: 1 addition & 1 deletion tika-bundle/pom.xml
Original file line number Diff line number Diff line change
@@ -25,7 +25,7 @@
<parent>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parent</artifactId>
<version>1.28.4</version>
<version>1.28.5</version>
<relativePath>../tika-parent/pom.xml</relativePath>
</parent>

2 changes: 1 addition & 1 deletion tika-core/pom.xml
Original file line number Diff line number Diff line change
@@ -25,7 +25,7 @@
<parent>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parent</artifactId>
<version>1.28.4</version>
<version>1.28.5</version>
<relativePath>../tika-parent/pom.xml</relativePath>
</parent>

3 changes: 2 additions & 1 deletion tika-core/src/main/java/org/apache/tika/fork/ForkClient.java
Original file line number Diff line number Diff line change
@@ -279,7 +279,7 @@ public synchronized void close() {
private Throwable waitForResponse(List<ForkResource> resources)
throws IOException {
output.flush();
while (true) {
while (!Thread.currentThread().isInterrupted()) {
int type = input.read();
if (type == -1) {
throw new IOException(
@@ -300,6 +300,7 @@ private Throwable waitForResponse(List<ForkResource> resources)
return null;
}
}
throw new IOException(new InterruptedException());
}

/**
39 changes: 39 additions & 0 deletions tika-core/src/test/java/org/apache/tika/fork/ForkParserTest.java
Original file line number Diff line number Diff line change
@@ -37,9 +37,15 @@
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.time.Instant;
import java.time.temporal.ChronoUnit;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Semaphore;
import java.util.concurrent.TimeUnit;

import org.apache.tika.TikaTest;
import org.apache.tika.exception.TikaException;
@@ -460,6 +466,39 @@ public void testNoUTFDataFormatException() throws Exception {
proxy.skippedEntity(sb.toString());
}

@Test
public void testForkParserDoesntPreventShutdown() throws Exception {
ExecutorService service = Executors.newFixedThreadPool(1);
CountDownLatch cdl = new CountDownLatch(1);
service.submit(() -> {
try (ForkParser parser = new ForkParser(ForkParserTest.class.getClassLoader(),
new ForkTestParser.ForkTestParserWaiting())) {
Metadata metadata = new Metadata();
ContentHandler output = new BodyContentHandler();
InputStream stream = new ByteArrayInputStream(new byte[0]);
ParseContext context = new ParseContext();
cdl.countDown();
parser.parse(stream, output, metadata, context);
// Don't care about output not planning to get this far
} catch (IOException | SAXException | TikaException e) {
throw new RuntimeException(e);
}
});
// Wait to make sure submitted runnable is actually running
boolean await = cdl.await(1, TimeUnit.SECONDS);
if (!await) {
// This should never happen but be thorough
fail("Future never ran so cannot test cancellation");
}
// Parse is being called try and shutdown
Instant requestShutdown = Instant.now();
service.shutdownNow();
service.awaitTermination(15, TimeUnit.SECONDS);
long secondsSinceShutdown = ChronoUnit.SECONDS.between(requestShutdown, Instant.now());
assertTrue("Should have shutdown the service in less than 5 seconds", secondsSinceShutdown < 5);
}


//use this to test that the wrapper handler is acted upon by the server but not proxied back
private static class ToFileHandler extends AbstractRecursiveParserWrapperHandler {

15 changes: 14 additions & 1 deletion tika-core/src/test/java/org/apache/tika/fork/ForkTestParser.java
Original file line number Diff line number Diff line change
@@ -64,4 +64,17 @@ public void parse(InputStream stream, ContentHandler handler, Metadata metadata,
super.parse(stream, handler, metadata, context);
}
}
}

static class ForkTestParserWaiting extends ForkTestParser {
@Override
public void parse(InputStream stream, ContentHandler handler, Metadata metadata,
ParseContext context) throws IOException, SAXException, TikaException {
try {
Thread.sleep(10_000);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
super.parse(stream, handler, metadata, context);
}
}
}
8 changes: 4 additions & 4 deletions tika-dl/pom.xml
Original file line number Diff line number Diff line change
@@ -24,7 +24,7 @@
<parent>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parent</artifactId>
<version>1.28.4</version>
<version>1.28.5</version>
<relativePath>../tika-parent/pom.xml</relativePath>
</parent>

@@ -37,7 +37,7 @@
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<dl4j.version>1.0.0-beta6</dl4j.version>
<twelvemonkeys.version>3.8.2</twelvemonkeys.version>
<twelvemonkeys.version>3.8.3</twelvemonkeys.version>
</properties>

<dependencies>
@@ -298,7 +298,7 @@
<dependency>
<groupId>org.objenesis</groupId>
<artifactId>objenesis</artifactId>
<version>3.2</version>
<version>3.3</version>
</dependency>
<dependency>
<groupId>org.nd4j</groupId>
@@ -375,7 +375,7 @@
<dependency>
<groupId>joda-time</groupId>
<artifactId>joda-time</artifactId>
<version>2.10.14</version>
<version>2.11.1</version>
</dependency>
<dependency>
<groupId>commons-io</groupId>
4 changes: 2 additions & 2 deletions tika-eval/pom.xml
Original file line number Diff line number Diff line change
@@ -25,7 +25,7 @@
<parent>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parent</artifactId>
<version>1.28.4</version>
<version>1.28.5</version>
<relativePath>../tika-parent/pom.xml</relativePath>
</parent>

@@ -65,7 +65,7 @@
<dependency>
<groupId>com.h2database</groupId>
<artifactId>h2</artifactId>
<version>2.1.212</version>
<version>2.1.214</version>
</dependency>
<dependency>
<groupId>commons-cli</groupId>
8 changes: 4 additions & 4 deletions tika-example/pom.xml
Original file line number Diff line number Diff line change
@@ -25,7 +25,7 @@
<parent>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parent</artifactId>
<version>1.28.4</version>
<version>1.28.5</version>
<relativePath>../tika-parent/pom.xml</relativePath>
</parent>

@@ -100,7 +100,7 @@
<dependency>
<groupId>org.apache.jackrabbit</groupId>
<artifactId>jackrabbit-jcr-server</artifactId>
<version>2.21.10</version>
<version>${jackrabbit.version}</version>
<exclusions>
<exclusion>
<groupId>org.apache.tika</groupId>
@@ -127,7 +127,7 @@
<dependency>
<groupId>org.apache.jackrabbit</groupId>
<artifactId>jackrabbit-core</artifactId>
<version>2.21.10</version>
<version>${jackrabbit.version}</version>
<exclusions>
<exclusion>
<groupId>org.apache.tika</groupId>
@@ -156,7 +156,7 @@
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-context</artifactId>
<version>5.3.20</version>
<version>5.3.22</version>
<exclusions>
<exclusion>
<groupId>commons-logging</groupId>
2 changes: 1 addition & 1 deletion tika-fuzzing/pom.xml
Original file line number Diff line number Diff line change
@@ -21,7 +21,7 @@
<parent>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parent</artifactId>
<version>1.28.4</version>
<version>1.28.5</version>
<relativePath>../tika-parent/pom.xml</relativePath>
</parent>

2 changes: 1 addition & 1 deletion tika-java7/pom.xml
Original file line number Diff line number Diff line change
@@ -25,7 +25,7 @@
<parent>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parent</artifactId>
<version>1.28.4</version>
<version>1.28.5</version>
<relativePath>../tika-parent/pom.xml</relativePath>
</parent>

2 changes: 1 addition & 1 deletion tika-langdetect/pom.xml
Original file line number Diff line number Diff line change
@@ -25,7 +25,7 @@
<parent>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parent</artifactId>
<version>1.28.4</version>
<version>1.28.5</version>
<relativePath>../tika-parent/pom.xml</relativePath>
</parent>

2 changes: 1 addition & 1 deletion tika-nlp/pom.xml
Original file line number Diff line number Diff line change
@@ -24,7 +24,7 @@
<parent>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parent</artifactId>
<version>1.28.4</version>
<version>1.28.5</version>
<relativePath>../tika-parent/pom.xml</relativePath>
</parent>

34 changes: 22 additions & 12 deletions tika-parent/pom.xml
Original file line number Diff line number Diff line change
@@ -25,13 +25,13 @@
<parent>
<groupId>org.apache</groupId>
<artifactId>apache</artifactId>
<version>24</version>
<version>27</version>
<relativePath />
</parent>

<groupId>org.apache.tika</groupId>
<artifactId>tika-parent</artifactId>
<version>1.28.4</version>
<version>1.28.5</version>
<packaging>pom</packaging>

<name>Apache Tika parent</name>
@@ -232,6 +232,14 @@
<role>committer</role>
</roles>
</developer>
<developer>
<name>Tilman Hausherr</name>
<id>tilman</id>
<timezone>Europe/Berlin</timezone>
<roles>
<role>committer</role>
</roles>
</developer>
</developers>
<contributors>
<contributor>
@@ -263,39 +271,41 @@
<forbiddenapis.version>3.3</forbiddenapis.version>
<groovy.maven.version>2.1.1</groovy.maven.version>
<maven.antrun.version>1.8</maven.antrun.version>
<maven.assembly.version>3.3.0</maven.assembly.version>
<maven.bundle.version>5.1.6</maven.bundle.version>
<maven.assembly.version>3.4.1</maven.assembly.version>
<maven.bundle.version>5.1.8</maven.bundle.version>
<maven.failsafe.version>2.22.2</maven.failsafe.version>
<maven.javadoc.version>3.3.1</maven.javadoc.version>
<maven.scr.version>1.26.4</maven.scr.version>
<maven.surefire.version>3.0.0-M6</maven.surefire.version>
<maven.shade.version>3.3.0</maven.shade.version>
<rat.version>0.13</rat.version>
<rat.version>0.14</rat.version>
<!-- NOTE: sync tukaani version with commons-compress in tika-parsers -->
<poi.version>5.2.2</poi.version>
<commons.compress.version>1.21</commons.compress.version>
<commons.io.version>2.11.0</commons.io.version>
<commons.lang3.version>3.12.0</commons.lang3.version>
<gson.version>2.9.0</gson.version>
<gson.version>2.9.1</gson.version>
<guava.version>31.1-jre</guava.version>
<osgi.core.version>6.0.0</osgi.core.version>

<cxf.version>3.5.2</cxf.version>
<cxf.version>3.5.3</cxf.version>
<slf4j.version>1.7.36</slf4j.version>
<!-- can't update to 2.18.0, see TIKA-3813 -->
<log4j2.version>2.17.2</log4j2.version>
<jackson.version>2.13.3</jackson.version>
<jackson.version>2.13.4</jackson.version>
<!-- when this is next upgraded, see if we can get rid of
javax.activation dependency in tika-server.
Until then, DO NOT go above 2.x unless you know what you're doing.
See TIKA-3407 -->
<jaxb.version>2.3.5</jaxb.version>
<cli.version>1.5.0</cli.version>
<lucene.version>8.11.1</lucene.version>
<mockito.version>4.6.1</mockito.version>
<lucene.version>8.11.2</lucene.version>
<mockito.version>4.8.0</mockito.version>
<lombok.version>1.18.24</lombok.version>
<!-- 2.0.0 doesn't work with jdk8 -->
<opennlp.version>1.9.4</opennlp.version>
<xerces.version>2.12.2</xerces.version>
<jackrabbit.version>2.21.12</jackrabbit.version>
</properties>
<dependencyManagement>
<dependencies>
@@ -424,7 +434,7 @@
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>versions-maven-plugin</artifactId>
<version>2.10.0</version>
<version>2.12.0</version>
<configuration>
<generateBackupPoms>false</generateBackupPoms>
</configuration>
@@ -683,6 +693,6 @@
<connection>scm:git:https://github.com/apache/</connection>
<developerConnection>scm:git:https://github.com/apache/</developerConnection>
<url>https://github.com/apache/tika</url>
<tag>1.28.4-rc1</tag>
<tag>1.28.5-rc1</tag>
</scm>
</project>
18 changes: 9 additions & 9 deletions tika-parsers/pom.xml
Original file line number Diff line number Diff line change
@@ -25,7 +25,7 @@
<parent>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parent</artifactId>
<version>1.28.4</version>
<version>1.28.5</version>
<relativePath>../tika-parent/pom.xml</relativePath>
</parent>

@@ -93,7 +93,7 @@
<dependency>
<groupId>com.fasterxml.woodstox</groupId>
<artifactId>woodstox-core</artifactId>
<version>6.2.8</version>
<version>6.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
@@ -140,7 +140,7 @@
<dependency>
<groupId>com.googlecode.plist</groupId>
<artifactId>dd-plist</artifactId>
<version>1.23</version>
<version>1.24</version>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
@@ -266,7 +266,7 @@
<dependency>
<groupId>com.healthmarketscience.jackcess</groupId>
<artifactId>jackcess</artifactId>
<version>4.0.1</version>
<version>4.0.2</version>
<exclusions>
<exclusion>
<groupId>org.apache.commons</groupId>
@@ -372,7 +372,7 @@
<dependency>
<groupId>com.github.junrar</groupId>
<artifactId>junrar</artifactId>
<version>7.5.2</version>
<version>7.5.3</version>
<exclusions>
<exclusion>
<groupId>commons-logging</groupId>
@@ -417,7 +417,7 @@
<dependency>
<groupId>org.xerial</groupId>
<artifactId>sqlite-jdbc</artifactId>
<version>3.36.0.3</version>
<version>3.39.2.1</version>
<scope>provided</scope>
</dependency>

@@ -571,12 +571,12 @@
<dependency>
<groupId>net.java.dev.jna</groupId>
<artifactId>jna</artifactId>
<version>5.11.0</version>
<version>5.12.1</version>
</dependency>
<dependency>
<groupId>com.google.protobuf</groupId>
<artifactId>protobuf-java</artifactId>
<version>3.19.4</version>
<version>3.21.5</version>
</dependency>
<dependency>
<groupId>edu.ucar</groupId>
@@ -1076,7 +1076,7 @@
<dependency>
<groupId>org.apache.maven</groupId>
<artifactId>maven-model</artifactId>
<version>3.3.3</version>
<version>3.8.6</version>
</dependency>
<dependency>
<groupId>org.codehaus.groovy</groupId>
Original file line number Diff line number Diff line change
@@ -52,6 +52,7 @@
import org.apache.pdfbox.pdmodel.PDEmbeddedFilesNameTreeNode;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageTree;
import org.apache.pdfbox.pdmodel.common.COSObjectable;
import org.apache.pdfbox.pdmodel.common.PDDestinationOrAction;
import org.apache.pdfbox.pdmodel.common.PDNameTreeNode;
import org.apache.pdfbox.pdmodel.common.filespecification.PDComplexFileSpecification;
@@ -145,6 +146,8 @@ enum ActionTrigger {
*/
private final static int MAX_RECURSION_DEPTH = 100;

private final static int MAX_BOOKMARK_ITEMS = 10000;

private final static TesseractOCRConfig DEFAULT_TESSERACT_CONFIG = new TesseractOCRConfig();

private static final MediaType XFA_MEDIA_TYPE = MediaType.application("vnd.adobe.xdp+xml");
@@ -729,23 +732,38 @@ protected void endDocument(PDDocument pdf) throws IOException {
void extractBookmarkText() throws SAXException, IOException, TikaException {
PDDocumentOutline outline = document.getDocumentCatalog().getDocumentOutline();
if (outline != null) {
extractBookmarkText(outline);
Set<COSObjectable> seen = new HashSet<>();
extractBookmarkText(outline, seen, 0);
}
}

void extractBookmarkText(PDOutlineNode bookmark) throws SAXException, IOException, TikaException {
void extractBookmarkText(PDOutlineNode bookmark, Set<COSObjectable> seen, int itemCount)
throws SAXException, IOException, TikaException {
PDOutlineItem current = bookmark.getFirstChild();

if (current != null) {
if (seen.contains(current)) {
return;
}
if (itemCount > MAX_BOOKMARK_ITEMS) {
return;
}
xhtml.startElement("ul");
while (current != null) {
if (seen.contains(current)) {
break;
}
if (itemCount > MAX_BOOKMARK_ITEMS) {
break;
}
seen.add(current);
xhtml.startElement("li");
xhtml.characters(current.getTitle());
xhtml.endElement("li");
handleDestinationOrAction(current.getAction(), ActionTrigger.BOOKMARK);
// Recurse:
extractBookmarkText(current);
extractBookmarkText(current, seen, itemCount + 1);
current = current.getNextSibling();
itemCount++;
}
xhtml.endElement("ul");
}
2 changes: 1 addition & 1 deletion tika-serialization/pom.xml
Original file line number Diff line number Diff line change
@@ -25,7 +25,7 @@
<parent>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parent</artifactId>
<version>1.28.4</version>
<version>1.28.5</version>
<relativePath>../tika-parent/pom.xml</relativePath>
</parent>

6 changes: 3 additions & 3 deletions tika-server/pom.xml
Original file line number Diff line number Diff line change
@@ -20,7 +20,7 @@
<parent>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parent</artifactId>
<version>1.28.4</version>
<version>1.28.5</version>
<relativePath>../tika-parent/pom.xml</relativePath>
</parent>

@@ -29,7 +29,7 @@
<url>http://tika.apache.org/</url>

<properties>
<cxf.micrometer.version>1.9.0</cxf.micrometer.version>
<cxf.micrometer.version>1.9.3</cxf.micrometer.version>
<micrometer-extras.version>0.2.2</micrometer-extras.version>
</properties>

@@ -70,7 +70,7 @@
<dependency>
<groupId>com.opencsv</groupId>
<artifactId>opencsv</artifactId>
<version>5.6</version>
<version>5.7.0</version>
</dependency>
<!-- avoid org.apache.commons:commons-text:1.9 dependent on org.apache.commons:commons-lang3:3.11 -->
<dependency>
2 changes: 1 addition & 1 deletion tika-translate/pom.xml
Original file line number Diff line number Diff line change
@@ -25,7 +25,7 @@
<parent>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parent</artifactId>
<version>1.28.4</version>
<version>1.28.5</version>
<relativePath>../tika-parent/pom.xml</relativePath>
</parent>

2 changes: 1 addition & 1 deletion tika-xmp/pom.xml
Original file line number Diff line number Diff line change
@@ -25,7 +25,7 @@
<parent>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parent</artifactId>
<version>1.28.4</version>
<version>1.28.5</version>
<relativePath>../tika-parent/pom.xml</relativePath>
</parent>