Skip to content

Commit

Permalink
Exclude deleted documents when extracting stored fields
Browse files Browse the repository at this point in the history
When reading stored fields, `loadDocument` calls
`IndexReader.storedFields().document(id, ...)` to get back the stored
fields for a given document.  However, Lucene doesn't automatically
check that the requested document is still live:

> NOTE: for performance reasons, this method does not check if the
> requested document is deleted, and therefore asking for a deleted
> document may yield unspecified results. Usually this is not
> required, however you can test if the doc is deleted by checking the
> Bits returned from
> MultiBits.getLiveDocs(org.apache.lucene.index.IndexReader).

This can lead to browse headings appearing that don't actually link to
anything.

Adds a check against the live docs bitset, as described in the
documentation above.
  • Loading branch information
marktriggs committed Oct 31, 2023
1 parent a9fc247 commit ad90085
Showing 1 changed file with 9 additions and 1 deletion.
10 changes: 9 additions & 1 deletion browse-indexing/StoredFieldLeech.java
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import org.apache.lucene.store.*;
import org.apache.lucene.index.*;
import org.apache.lucene.document.*;
import org.apache.lucene.util.Bits;

import org.vufind.util.Utils;
import org.vufind.util.BrowseEntry;
Expand All @@ -20,6 +21,7 @@ public class StoredFieldLeech extends Leech

private Set<String> fieldSelection;

private Bits liveDocsBitSet;

public StoredFieldLeech(String indexPath, String field) throws Exception
{
Expand All @@ -40,6 +42,10 @@ public StoredFieldLeech(String indexPath, String field) throws Exception
fieldSelection.add("id"); // make Solr id available for error messages

reader = DirectoryReader.open(FSDirectory.open(new File(indexPath).toPath()));

// Will be null if the index contains no deletes.
liveDocsBitSet = MultiBits.getLiveDocs(reader);

buffer = new LinkedList<BrowseEntry> ();
}

Expand Down Expand Up @@ -80,7 +86,9 @@ public BrowseEntry next() throws Exception
{
while (buffer.isEmpty()) {
if (currentDoc < reader.maxDoc()) {
loadDocument(reader, currentDoc);
if (this.liveDocsBitSet == null || this.liveDocsBitSet.get(currentDoc)) {
loadDocument(reader, currentDoc);
}
currentDoc++;
} else {
return null;
Expand Down

0 comments on commit ad90085

Please sign in to comment.