Search Parameter Highlighting in PDF.js
by Kaz Smith
2015/08/27
Introduction
Mozilla’s PDF.js is a great utility for rendering PDFs in the browser. PDF.js is very versatile, and can be used to display a PDF anywhere in a webpage. At Mazira, we’re developing GoldFynch, a web application which lawyers will use to organize, search, and process the large volume of case documents they handle. Since GoldFynch handles lots of documents, PDF rendering support within the application is very important. We chose to use PDF.js to render PDFs (and documents converted to PDF) in GoldFynch. We’ve been very happy with the capabilities of PDF.js.
The Problem
However, PDF.js is a generalized application designed to provide a commonly used subset of tools for PDF rendering and processing. It can’t fit everyone’s needs, and it didn’t perfectly fit our needs. As we’ve developed GoldFynch here at Mazira, a number of lawyers have requested the ability to use GoldFynch to open attachments displayed in PDF versions of emails. This is a fairly specific problem, so it’s not handled by PDF.js. We had to develop our own solution. One important part of turning these attachment filenames into links is highlighting them to indicate that they are clickable. I was tasked with editing PDF.js to handle a “search” parameter in the URL used to open the PDF. This search parameter would then be highlighted in the PDF. With this solution, we could easily draw attention to any part of the PDF.
The Solution (highlighting stuff)
Within PDF.js, I had to make a few changes to viewer.js. I decided the best way to do the highlighting was to work alongside the find functionality in PDF.js. Under the hood, PDF.js contains lots of small divs which contain parts of the text of the PDF. Mirroring the find functionality would give me access to these divs. However, I knew I couldn’t just trigger the find bar to search for the URL search parameter, since that would cause the highlighting to be removed when the user used the find bar to search for something else.
When you use the find bar in PDF.js, it just searches the page text for matches to your query. I copied this idea, and saved a separate set of matches for the URL search parameter. The actual displaying of the highlighting is where it gets a bit more interesting. I needed the highlighting to stay there and to not interfere with the find bar functionality. I decided to change the background of the various divs containing pieces of the PDF text to a transparent yellow to act as highlighting. However, these divs might contain more than just the text I wanted to highlight. They could also contain only part of the text I wanted to highlight. To deal with this issue, I decided to use CSS gradients.
Here’s the first part of my code:
var k0 = 0, k1 = parMatches.length;
for (var k = k0; k < k1; k++) {
var parMatch = parMatches[k];
var begin = parMatch.begin;
var end = parMatch.end;
var hlBeginPercent = begin.offset / bidiTexts[begin.divIdx].str.length 100;
hlBeginPercent = hlBeginPercent.toFixed(1);
var hlEndPercent = end.offset / bidiTexts[end.divIdx].str.length 100;
hlEndPercent = hlEndPercent.toFixed(1);
...
I loop through the text matches - parMatch
represents a specific match. begin
represents the text div in which the match starts, and end
represents the text div in which the match ends. Next I calculate a couple of percentages. The first, hlBeginPercent
, represents the position in the begin
div at which the highlighting should start. begin.offset
is the number of characters into the begin
div at which the match starts, and bidiTexts
is an array containing all the text divs. So, bidiTexts[begin.divIdx].str.length
is the length of the text in the begin
div. Therefore hlBeginPercent
gets set to the position in the begin
div at which the match text starts, calculated as a percentage. hlEndPercent
is essentially the same, but it represents the position in the end
div at which the match text ends.
Now I actually apply the gradients:
if (begin.divIdx === end.divIdx) {
var beginStr = '(left, ' +
'rgba(0, 0, 0, 0) 0%, ' +
'rgba(0, 0, 0, 0) ' + hlBeginPercent + '%, ' +
'rgba(255, 255, 0, 0.9) ' + hlBeginPercent + '%, ' +
'rgba(255, 255, 0, 0.9) ' + hlEndPercent + '%, ' +
'rgba(0, 0, 0, 0) ' + hlEndPercent + '%, ' +
'rgba(0, 0, 0, 0) 100%)';
textDivs[begin.divIdx].style.background = '-webkit-linear-gradient' + beginStr;
textDivs[begin.divIdx].style.background = '-moz-linear-gradient' + beginStr;
textDivs[begin.divIdx].style.background = '-ms-linear-gradient' + beginStr;
textDivs[begin.divIdx].style.background = '-o-linear-gradient' + beginStr;
textDivs[begin.divIdx].style.background = 'linear-gradient' + beginStr;
} else {
...
If the match text is all contained in one text div, it’s not too complicated. You can look at the syntax for CSS gradients here. My gradient sets the background of the text div to be transparent until hlBeginPercent
% of the way into the div, then to be a slightly transparent yellow until hlEndPercent
% of the way into the div, then to be transparent again until the end of the div. This results in the div being a slightly transparent yellow where the match text is, which makes the match text look highlighted.
Lastly, here’s the code used when the match text is contained in multiple divs:
} else {
var beginStr = '(left, ' +
'rgba(0, 0, 0, 0) 0%, ' +
'rgba(0, 0, 0, 0) ' + hlBeginPercent + '%, ' +
'rgba(255, 255, 0, 0.9) ' + hlBeginPercent + '%, ' +
'rgba(255, 255, 0, 0.9) 100%)';
textDivs[begin.divIdx].style.background = '-webkit-linear-gradient' + beginStr;
textDivs[begin.divIdx].style.background = '-moz-linear-gradient' + beginStr;
textDivs[begin.divIdx].style.background = '-ms-linear-gradient' + beginStr;
textDivs[begin.divIdx].style.background = '-o-linear-gradient' + beginStr;
textDivs[begin.divIdx].style.background = 'linear-gradient' + beginStr;
for (var midDivIdx = begin.divIdx + 1; midDivIdx < end.divIdx; ++midDivIdx) {
textDivs[midDivIdx].style.background = 'rgba(255, 255, 0, 0.9)';
}
var endStr = '(left, ' +
'rgba(255, 255, 0, 0.9) 0%, ' +
'rgba(255, 255, 0, 0.9) ' + hlEndPercent + '%, ' +
'rgba(0, 0, 0, 0) ' + hlEndPercent + '%, ' +
'rgba(0, 0, 0, 0) 100%)';
textDivs[end.divIdx].style.background = '-webkit-linear-gradient' + endStr;
textDivs[end.divIdx].style.background = '-moz-linear-gradient' + endStr;
textDivs[end.divIdx].style.background = '-ms-linear-gradient' + endStr;
textDivs[end.divIdx].style.background = '-o-linear-gradient' + endStr;
textDivs[end.divIdx].style.background = 'linear-gradient' + endStr;
}
The first div gets highlighted from hlBeginPercent
% of the way into the div to the end. Any divs in the middle get completely highlighted. The last div gets highlighted from the beginning through hlEndPercent
% of the way into the div.
That’s all! Here’s the full block of code (remember to make it cross browser!)
var k0 = 0, k1 = parMatches.length;
for (var k = k0; k < k1; k++) {
var parMatch = parMatches[k];
var begin = parMatch.begin;
var end = parMatch.end;
var hlBeginPercent = begin.offset / bidiTexts[begin.divIdx].str.length 100;
hlBeginPercent = hlBeginPercent.toFixed(1);
var hlEndPercent = end.offset / bidiTexts[end.divIdx].str.length 100;
hlEndPercent = hlEndPercent.toFixed(1);
if (begin.divIdx === end.divIdx) {
var beginStr = '(left, ' +
'rgba(0, 0, 0, 0) 0%, ' +
'rgba(0, 0, 0, 0) ' + hlBeginPercent + '%, ' +
'rgba(255, 255, 0, 0.9) ' + hlBeginPercent + '%, ' +
'rgba(255, 255, 0, 0.9) ' + hlEndPercent + '%, ' +
'rgba(0, 0, 0, 0) ' + hlEndPercent + '%, ' +
'rgba(0, 0, 0, 0) 100%)';
textDivs[begin.divIdx].style.background = '-webkit-linear-gradient' + beginStr;
textDivs[begin.divIdx].style.background = '-moz-linear-gradient' + beginStr;
textDivs[begin.divIdx].style.background = '-ms-linear-gradient' + beginStr;
textDivs[begin.divIdx].style.background = '-o-linear-gradient' + beginStr;
textDivs[begin.divIdx].style.background = 'linear-gradient' + beginStr;
} else {
var beginStr = '(left, ' +
'rgba(0, 0, 0, 0) 0%, ' +
'rgba(0, 0, 0, 0) ' + hlBeginPercent + '%, ' +
'rgba(255, 255, 0, 0.9) ' + hlBeginPercent + '%, ' +
'rgba(255, 255, 0, 0.9) 100%)';
textDivs[begin.divIdx].style.background = '-webkit-linear-gradient' + beginStr;
textDivs[begin.divIdx].style.background = '-moz-linear-gradient' + beginStr;
textDivs[begin.divIdx].style.background = '-ms-linear-gradient' + beginStr;
textDivs[begin.divIdx].style.background = '-o-linear-gradient' + beginStr;
textDivs[begin.divIdx].style.background = 'linear-gradient' + beginStr;
for (var midDivIdx = begin.divIdx + 1; midDivIdx < end.divIdx; ++midDivIdx) {
textDivs[midDivIdx].style.background = 'rgba(255, 255, 0, 0.9)';
}
var endStr = '(left, ' +
'rgba(255, 255, 0, 0.9) 0%, ' +
'rgba(255, 255, 0, 0.9) ' + hlEndPercent + '%, ' +
'rgba(0, 0, 0, 0) ' + hlEndPercent + '%, ' +
'rgba(0, 0, 0, 0) 100%)';
textDivs[end.divIdx].style.background = '-webkit-linear-gradient' + endStr;
textDivs[end.divIdx].style.background = '-moz-linear-gradient' + endStr;
textDivs[end.divIdx].style.background = '-ms-linear-gradient' + endStr;
textDivs[end.divIdx].style.background = '-o-linear-gradient' + endStr;
textDivs[end.divIdx].style.background = 'linear-gradient' + endStr;
}
}
And here’s a screenshot of what the highlighting in PDF.js looks like (note the yellow highlighting):
