Comparing ebook collections

We more than doubled the size of our collection by subscribing to a (if you’ll pardon the term) “bulk” collection from an ebook vendor a couple of years ago.  It’s a cost-effective way to get access for a lot of material which will be available even when the building is closed or some distance from the user.  Over 70,000 records to start, at the time of comparison, up to 88, 907.  Let’s call that Collection A.

These “bulk” purchases come in various packages of differing sizes, and can either specialize in subject areas, or be — as we selected — an “Academic” collection, which may or may not hold some or all of the specialized collections.

We’re not above looking at alternatives, however, so we’ve done some comparisons with an offering from another vendor, where the Academic collection is presently shown as 119,757 (based on a spreadsheet download of titles at the time of comparison).  Let’s call that Collection (wait for it…) “B.”

So, how do we evaluate such a massive amount of records? And, how do we do it as fairly as possible?  Given the sizes, numbers are going to have to suffice for most purposes, which is not my preferred standard, but you work with what you can manage.


I started with date ranges.  Bring them into spreadsheets and sort by publication date.  This isn’t as accurate for comparisons as it might be, for several reasons.  For one thing, I have updates on Collection A received over the months since we got it.  Is it fair to compare it exactly for the current year ( part way through 2013) when the B spreadsheet may not include any titles updated since the spreadsheet was created for marketing?  I’d prefer to see 2013 but not weight it as much as I might.

Also — and this is much harder, probably impossible, to compensate for — a number of ebooks tend to be shown with the ebook publication date, even though the original print date is years earlier, such as a classic work that has just become an ebook.  This tends to flex the ebook records to appear more recent than the contents actually are.  Add to that the fact that some publishers don’t come out with the ebook until after the print, and the dates are skewed further.  I can’t really do much about this, given the large numbers I’m dealing with.

Dates are also tricky in that many records have odd punctuation, “c” for copyright, brackets, and quite a few multiple dates, but this shows up only in Collection A.  Does that mean Collection B has more accurate dates, or just more of them updated to the ebook edition regardless of the original print edition?  I suspect the latter, given a few of the titles which I can tell are older but show 2013.

I do go back several years, to 2008, and compare the numbers.  Since I’ve got two different base sizes, however, I then figure by percentages, to keep it a bit more fair.

Collection A consists of 4,8% titles from 2012.  Collection B consists of 3.4%.

Collection A consists of 7.15% titles from 2011, Collection B has 7.12%.

Collection A consists of 10.6% titles from 2010, Collection B has 9.44%.

Collection A consists of 8.95% titles from 2009, Collection B has 9.04%.

Collection A consists of 8.29% titles from 2008, Collection B has 7.63%.

Okay, let’s jump to a range:

Collection A consists of 88.33% titles from 2000 or later, Collection B has 77%.

So, Collection B has more titles but seems to have a close percentage based on percentages of the collection, until we get to 2000 or later where it falls behind.  How significant is that, given that B’s percentage is from a much larger collection?  Maybe that 11% difference is a smaller factor than it may appear, but it should be kept in mind, at least until some more comparisons can be made.  Going by the other percentages, however, the collections aren’t far apart  in recent years.

Now, a really in depth comparison could take the collections and do this again by sorting first by call number, and then by date, to see how the collections stack up by subject area.  However, this is (a) more time-consuming, and (b) not necessarily as useful as might be expected.

Publishers are weird about ebooks.  You can have key publishers in certain subjects refuse to be part of a third-party collections because (a) they don’t do ebooks, or (b) they do ebooks on their own, or (c) they have ebooks through this vendor but not in the collections — only individually, or (d) it’s just the wrong POTM (Phase Of The Moon).  So collections which look skimpy in a particular subject might just be caught by some of these factors, and/or by when the publisher(s) signed the contract (as in, before or after the new owners came in and made changes not already limited by that contract).  So, I’m going to pass on trying to do anything such as that.

Choice Titles

Now, instead of just calculations, comes the Choice Outstanding Academic Titles lists.  My director thought this would be a good standard to apply, and it’s a very convenient one at that.

I used the three previous years: 2010, 2011, and 2012.  I sorted by title (to randomize the subjects covered a bit), and downloaded the files to spreadsheets (and I’m very glad they set up that feature on Choice).

I have to play with the titles a bit — remove the quotes, leading articles, like that.  The download is NOT alphabetized by title at this point, so next I do that.  Then I remove the unneeded columns, and add two columns on the far left for the A and B collections.

Now I go through the first 100 titles of each year (total 300 titles) and compare them to the holdings in the two collections.  That might not seem like a large sample, but the results were so consistent, I decided it was sufficient.  This is time-consuming, but I quickly realized just a sample would do.

The holdings for each year were exactly identical for both collections.  Same titles held, same number of titles in each 100.

Again, this is probably largely to do with publishers.  Some of them don’t want to have ebooks or want to keep them away from the large vendors; some might sell through these vendors but the licensing or pricing make the titles cost-inefficient to put into bulk collections such as these.  So, both vendors are working from pretty much the same pool of titles, I suspect (for academic titles, anyway).

There’s also the fact that both vendors almost certainly have access to Choice and expect librarians to use that as a standard, so they are likely to focus on getting what they can of those OAT titles for their Academic collections.

User Survey

Now, I did a little informal survey of students who were going through our promotional event in Fall 2013.  I had them look at a screen shot of the same title in both A and B versions, and asked which they preferred, if they had a preference.

Caveat: one of the reference librarians preferred A, and I leaned to B, on this one.  We’re both flexible, however, so left it up to the students.

After the results were tallied, B came out ahead by a considerable margin with students, although A still had some adherents.  There were details such as having controls that did, or did not, scroll off the page, and the use of non-obvious controls for tasks.  Ebooks are so new that controls are not all that standardized, but some actions need to be obvious, and stay on the screen if the entire page is not visible.

Factors, therefore, come out fairly close, although B might have more older materials which give it a larger count.

We use Adobe Digital Editions for Windows/Macs, and suggest Bluefire for use with Android and iOS since the apps are pretty much the same (and therefore easy to help people with).

Internal Factors

Okay, let’s consider internal factors such as upkeep and price.

Collection B is from the same vendor who handles our Discovery service.  A, however, is not, and therefore we have to do more upkeep on sending files to the Discovery service vendor.  There’s staff time on that, including changing settings on the deleted ebooks before sending those records, which varies from month to month.  I have to download records to the catalog in either case, but B relieves the chores of updating the Discovery service.

Price: B is cheaper, by enough to buy at least one or more other databases.  (We have the A invoice and a quote for collection B.)  These days, that’s a humongous factor, although it can be outweighed or at least balanced with enough on the other side.


As of spring 2014, the final decision by the director, considering all these factors, is to switch to Collection B.

This is not without consequences.  Due to Collection A expiring at the end of March, we’re changing in mid-semester.  Students who thought they had an ebook for their paper may find it missing later, for example.

I’ll have to take all the old ebooks from Collection A out of the catalog, and update the Discovery service, as well as OCLC WorldShare, on our holdings.


Also, I do a modification of bibliographic records in our catalog for our Programs page, which allows faculty to find all the materials in their subjects, as well as allowing me to keep track of materials for accreditation purposes.  (I’ve discussed this in another post.)  Now I’ll have to start from scratch with a huge batch of new records, adding 690 fields.  But, I knew the job was dangerous when I took it, Fred (as Super Chicken used to say).

[Please note that this is not to be considered an endorsement for any purposes, and no reimbursement was received in any form, including discounts, for this post.  That’s why I didn’t name vendors.  YMMV — Your Mileage May Vary]