Search Parameter Highlighting in PDF.js

by Kaz Smith
2015/08/27

Introduction

Mozilla’s PDF.js is a great utility for rendering PDFs in the browser. PDF.js is very versatile, and can be used to display a PDF anywhere in a webpage. At Mazira, we’re developing GoldFynch, a web application which lawyers will use to organize, search, and process the large volume of case documents they handle. Since GoldFynch handles lots of documents, PDF rendering support within the application is very important. We chose to use PDF.js to render PDFs (and documents converted to PDF) in GoldFynch. We’ve been very happy with the capabilities of PDF.js.

The Problem

However, PDF.js is a generalized application designed to provide a commonly used subset of tools for PDF rendering and processing. It can’t fit everyone’s needs, and it didn’t perfectly fit our needs. As we’ve developed GoldFynch here at Mazira, a number of lawyers have requested the ability to use GoldFynch to open attachments displayed in PDF versions of emails. This is a fairly specific problem, so it’s not handled by PDF.js. We had to develop our own solution. One important part of turning these attachment filenames into links is highlighting them to indicate that they are clickable. I was tasked with editing PDF.js to handle a “search” parameter in the URL used to open the PDF. This search parameter would then be highlighted in the PDF. With this solution, we could easily draw attention to any part of the PDF.

The Solution (highlighting stuff)

Within PDF.js, I had to make a few changes to viewer.js. I decided the best way to do the highlighting was to work alongside the find functionality in PDF.js. Under the hood, PDF.js contains lots of small divs which contain parts of the text of the PDF. Mirroring the find functionality would give me access to these divs. However, I knew I couldn’t just trigger the find bar to search for the URL search parameter, since that would cause the highlighting to be removed when the user used the find bar to search for something else.

When you use the find bar in PDF.js, it just searches the page text for matches to your query. I copied this idea, and saved a separate set of matches for the URL search parameter. The actual displaying of the highlighting is where it gets a bit more interesting. I needed the highlighting to stay there and to not interfere with the find bar functionality. I decided to change the background of the various divs containing pieces of the PDF text to a transparent yellow to act as highlighting. However, these divs might contain more than just the text I wanted to highlight. They could also contain only part of the text I wanted to highlight. To deal with this issue, I decided to use CSS gradients.

Here’s the first part of my code:

var k0 = 0, k1 = parMatches.length;
for (var k = k0; k < k1; k++) {
    var parMatch = parMatches[k];
    var begin = parMatch.begin;
    var end = parMatch.end;

    var hlBeginPercent = begin.offset / bidiTexts[begin.divIdx].str.length  100;
    hlBeginPercent = hlBeginPercent.toFixed(1);
    var hlEndPercent = end.offset / bidiTexts[end.divIdx].str.length  100;
    hlEndPercent = hlEndPercent.toFixed(1);
    ...

I loop through the text matches - parMatch represents a specific match. begin represents the text div in which the match starts, and end represents the text div in which the match ends. Next I calculate a couple of percentages. The first, hlBeginPercent, represents the position in the begin div at which the highlighting should start. begin.offset is the number of characters into the begin div at which the match starts, and bidiTexts is an array containing all the text divs. So, bidiTexts[begin.divIdx].str.length is the length of the text in the begin div. Therefore hlBeginPercent gets set to the position in the begin div at which the match text starts, calculated as a percentage. hlEndPercent is essentially the same, but it represents the position in the end div at which the match text ends.

Now I actually apply the gradients:

if (begin.divIdx === end.divIdx) { // the string to be highlighted is all in one div
    var beginStr = '(left, ' +
                    'rgba(0, 0, 0, 0) 0%, ' +
                    'rgba(0, 0, 0, 0) ' + hlBeginPercent + '%, ' +
                    'rgba(255, 255, 0, 0.9) ' + hlBeginPercent + '%, ' +
                    'rgba(255, 255, 0, 0.9) ' + hlEndPercent + '%, ' +
                    'rgba(0, 0, 0, 0) ' + hlEndPercent + '%, ' +
                    'rgba(0, 0, 0, 0) 100%)';
    textDivs[begin.divIdx].style.background = '-webkit-linear-gradient' + beginStr;
    textDivs[begin.divIdx].style.background = '-moz-linear-gradient' + beginStr;
    textDivs[begin.divIdx].style.background = '-ms-linear-gradient' + beginStr;
    textDivs[begin.divIdx].style.background = '-o-linear-gradient' + beginStr;
    textDivs[begin.divIdx].style.background = 'linear-gradient' + beginStr;
} else {
    ...

If the match text is all contained in one text div, it’s not too complicated. You can look at the syntax for CSS gradients here. My gradient sets the background of the text div to be transparent until hlBeginPercent% of the way into the div, then to be a slightly transparent yellow until hlEndPercent% of the way into the div, then to be transparent again until the end of the div. This results in the div being a slightly transparent yellow where the match text is, which makes the match text look highlighted.

Lastly, here’s the code used when the match text is contained in multiple divs:

} else { // the string to be highlighted is contained in multiple divs
    // the first div
    var beginStr = '(left, ' +
                  'rgba(0, 0, 0, 0) 0%, ' +
                  'rgba(0, 0, 0, 0) ' + hlBeginPercent + '%, ' +
                  'rgba(255, 255, 0, 0.9) ' + hlBeginPercent + '%, ' +
                  'rgba(255, 255, 0, 0.9) 100%)';
    textDivs[begin.divIdx].style.background = '-webkit-linear-gradient' + beginStr;
    textDivs[begin.divIdx].style.background = '-moz-linear-gradient' + beginStr;
    textDivs[begin.divIdx].style.background = '-ms-linear-gradient' + beginStr;
    textDivs[begin.divIdx].style.background = '-o-linear-gradient' + beginStr;
    textDivs[begin.divIdx].style.background = 'linear-gradient' + beginStr;

    // any divs in between the first and last divs
    // in which the string to be highlighted is contained
    for (var midDivIdx = begin.divIdx + 1; midDivIdx < end.divIdx; ++midDivIdx) {
        textDivs[midDivIdx].style.background = 'rgba(255, 255, 0, 0.9)';
    }

    // the last div
    var endStr =  '(left, ' +
                  'rgba(255, 255, 0, 0.9) 0%, ' +
                  'rgba(255, 255, 0, 0.9) ' + hlEndPercent + '%, ' +
                  'rgba(0, 0, 0, 0) ' + hlEndPercent + '%, ' +
                  'rgba(0, 0, 0, 0) 100%)';
    textDivs[end.divIdx].style.background = '-webkit-linear-gradient' + endStr;
    textDivs[end.divIdx].style.background = '-moz-linear-gradient' + endStr;
    textDivs[end.divIdx].style.background = '-ms-linear-gradient' + endStr;
    textDivs[end.divIdx].style.background = '-o-linear-gradient' + endStr;
    textDivs[end.divIdx].style.background = 'linear-gradient' + endStr;
}

The first div gets highlighted from hlBeginPercent% of the way into the div to the end. Any divs in the middle get completely highlighted. The last div gets highlighted from the beginning through hlEndPercent% of the way into the div.

That’s all! Here’s the full block of code (remember to make it cross browser!)

var k0 = 0, k1 = parMatches.length;
for (var k = k0; k < k1; k++) {
    var parMatch = parMatches[k];
    var begin = parMatch.begin;
    var end = parMatch.end;

    var hlBeginPercent = begin.offset / bidiTexts[begin.divIdx].str.length  100;
    hlBeginPercent = hlBeginPercent.toFixed(1);
    var hlEndPercent = end.offset / bidiTexts[end.divIdx].str.length  100;
    hlEndPercent = hlEndPercent.toFixed(1);
    if (begin.divIdx === end.divIdx) { // the string to be highlighted is all in one div
        var beginStr = '(left, ' +
                        'rgba(0, 0, 0, 0) 0%, ' +
                        'rgba(0, 0, 0, 0) ' + hlBeginPercent + '%, ' +
                        'rgba(255, 255, 0, 0.9) ' + hlBeginPercent + '%, ' +
                        'rgba(255, 255, 0, 0.9) ' + hlEndPercent + '%, ' +
                        'rgba(0, 0, 0, 0) ' + hlEndPercent + '%, ' +
                        'rgba(0, 0, 0, 0) 100%)';
        textDivs[begin.divIdx].style.background = '-webkit-linear-gradient' + beginStr;
        textDivs[begin.divIdx].style.background = '-moz-linear-gradient' + beginStr;
        textDivs[begin.divIdx].style.background = '-ms-linear-gradient' + beginStr;
        textDivs[begin.divIdx].style.background = '-o-linear-gradient' + beginStr;
        textDivs[begin.divIdx].style.background = 'linear-gradient' + beginStr;
    } else { // the string to be highlighted is contained in multiple divs
        // the first div
        var beginStr = '(left, ' +
                        'rgba(0, 0, 0, 0) 0%, ' +
                        'rgba(0, 0, 0, 0) ' + hlBeginPercent + '%, ' +
                        'rgba(255, 255, 0, 0.9) ' + hlBeginPercent + '%, ' +
                        'rgba(255, 255, 0, 0.9) 100%)';
        textDivs[begin.divIdx].style.background = '-webkit-linear-gradient' + beginStr;
        textDivs[begin.divIdx].style.background = '-moz-linear-gradient' + beginStr;
        textDivs[begin.divIdx].style.background = '-ms-linear-gradient' + beginStr;
        textDivs[begin.divIdx].style.background = '-o-linear-gradient' + beginStr;
        textDivs[begin.divIdx].style.background = 'linear-gradient' + beginStr;

        // any divs in between the first and last divs
        // in which the string to be highlighted is contained
        for (var midDivIdx = begin.divIdx + 1; midDivIdx < end.divIdx; ++midDivIdx) {
            textDivs[midDivIdx].style.background = 'rgba(255, 255, 0, 0.9)';
        }

        // the last div
        var endStr =  '(left, ' +
                      'rgba(255, 255, 0, 0.9) 0%, ' +
                      'rgba(255, 255, 0, 0.9) ' + hlEndPercent + '%, ' +
                      'rgba(0, 0, 0, 0) ' + hlEndPercent + '%, ' +
                      'rgba(0, 0, 0, 0) 100%)';
        textDivs[end.divIdx].style.background = '-webkit-linear-gradient' + endStr;
        textDivs[end.divIdx].style.background = '-moz-linear-gradient' + endStr;
        textDivs[end.divIdx].style.background = '-ms-linear-gradient' + endStr;
        textDivs[end.divIdx].style.background = '-o-linear-gradient' + endStr;
        textDivs[end.divIdx].style.background = 'linear-gradient' + endStr;
    }
}

And here’s a screenshot of what the highlighting in PDF.js looks like (note the yellow highlighting): pdf.js highlighting screenshot

Copyright © 2014 Mazira, LLC
All rights reserved.