Ebooks and overlays, backfired

So I celebrated Ebooks and overlays earlier. I didn’t want to mention my underlying fear.

It’s happened.

I have 160 records from vendor B that just overlaid records from vendor A. They have the same OCLC record numbers.

Nuts. I just knew this was coming. OCLC has multiple ebook vendors on the same records, which leads directly to this situation. And it’s not going to improve in the future, I expect.

Now, I could try to keep both on the same record, but all too often one vendor removes records while the other does not. (At present, vendor B is guilty of this more often, but that could change.) Trying to deal with that could end up being more complex and prone to error.

Now, if I duplicate the records and edit them to be exclusive of each other, at some point in the future, one vendor will want to overlay to delete, which means the catalog cannot decide which to overlay – resulting in a a new record being created. I can usually pick these out by looking for new records created during the overlay of deletion records.

But first, I have 160 records to separate. <sigh!>


Ebooks and Overlays

So finally vendor p..q…. has moved to using OCLC record numbers in the 001 fields of their records, instead of their own numbering system.

That means that I can now download records and — for example — overlay records with a notation to delete them. Huzzah! Couldn’t do that without some complex and expensive shenanigans with importing records before, when the numbering system was non-standard.

So, I take the academic collection as a basis and download that with NO overlays – new records must be inserted for all of them.

Then I take the public library collection (and later one for education, when it becomes available) and overlay it, so I don’t have a bunch of duplicate records. Then I remove duplicate fields, such as the 856 links, and get over 65,000 records (so far).

One little glitch. Not all the records used by this vendor are unique — a few (more than a dozen) are the same numbers as those used by a couple of other vendors, so those records are overlaid. Now I have multiple vendor links in the same records.

So, I have to find those records and make copies, and tweak until I have unique copies of the records with separate vendor links. That way, I can remove records from vendor A without losing records from vendor B.

It would have been nice if p..q…. had taken the original record, copied it, and changed it with their own link, and saved it under another OCLC record number in every case. It would have been. They seem to have done that for the most part, but at this time, there are a few little exceptions — and I have no idea why.

Close, people, really close. 4 out of 5 stars.


Ebooks and the stats

“Given the proliferation of e-Books and the discussions about the future of print books, the results demonstrate that only 4% of Americans are ‘e-Book only’ readers. Only 28% of Americans have ever read an e-Book and e-Book readers also read print books. Additionally, as the Survey results indicate, people prefer the two formats in different circumstances. People traveling prefer e-Books because they are portable and baggage restrictions don’t apply to them. Print books are ideal for reading to children and for sharing with friends and family. So, the two formats tend to be more complementary than competitive.

The contest between offering free e-Books through libraries and publishers’ fear of losing sales has been brought to rest by another surprising revelation. Active library users also tend to be the most active book buyers, print or e-Book. So, free e-Books in libraries do not actually drive down sale figures.”

from http://www.webjunction.org/news/webjunction/pew-research-center-7-surprising-facts-about-libraries.html

Just this weekend I purchased some e-books in a series which I like. I got epub format so they would go on my phone in the Bluefire app, and give me something to read during odd moments, usually while waiting for something. At the same time, I am reading print format books from the public library, part of the time while digitizing my phonograph records for music in my vehicle (since operating a phonograph-based stereo system in a moving vehicle is, if anything, even more unsafe than texting). All of that is consistent with the Pew results above (excepting the digitizing stuff).

And there is the argument that some have been making for a while now that e-books will take over and print will/has become obsolete. Thomas A. Edison did not predict that, but he predicted that the phonograph record would replace books (you know, those little wax cylinders — oh, wait, that changed to flat discs), and then motion pictures, and poor old Tom still didn’t have it right when we moved to DVDs of movies, and then streaming videos. The fact is, we have multiple formats because we have multiple uses, and multiple contents, and changing technologies, and personal preferences. Even the inventors cannot predict how long a new format will last. That’s why I’m digitizing my old phonograph records.

While the big publishers are reporting that e-book sales have been falling in 2015, or at least flattening (which is disputed elsewhere as only referring to those publishers sales), it seems that multiple formats will remain in the future. Which formats are available may shift, but versatility continues to be the preference of enough users/buyers that require that flexibility. Print, however, continues to endure and even thrive.

I already have several series that I have read various volumes in print and in e-book format. It’s a little harder to keep track of, but it allows me to use the format that is convenient/cost-effective at the moment.

And so it goes…

Ebooks and Overlays

When you buy ebooks, you have the records in the catalog and that’s pretty much it.  It’s a steady state deal, and if you weed, you do it by hiding or deleting the records.

Subscriptions such as “such and such Collection” are another thing entirely.  I didn’t realize until we’d been doing it a while how much change would occur as records were added and deleted.

I needed to overlay, but I didn’t realize that until too late.  The catches (multiple!) included:

  • Some ebooks came from more than one vendor, but the OCLC record might be the same.  If you overlay the same record with multiple links, how do you deal with having to remove several thousand from one vendor as that subscription changes but the other vendor still (at least for a while) continues to include them?
  • How do you deal with having purchased an ebook specifically, but it also shows up in the subscription, so overlaying the record means you may have to remove the subscription link but not the purchased one?
  • What do you cue on for overlaying records?  ISBNs turned out to be a bad idea, but not all ebooks come with an 001 field for OCLC number — some put that in the 035 field, or need to use the 001 field for a custom number (which then won’t load into our system since it is not recognized as a proper OCLC number).

Combining that with several vendors, and changing major ebook subscriptions from ebrary to ebscohost and THEN getting ebrary from a statewide subscription on top of that…

So, our ebscohost ebooks were a mess.  Lots of duplicate records, and that meant that more and more of those couldn’t be overlaid (which one?) so yet another was added instead, making it worse.

During spring break, I’ve been delete-proofing the ebscohost ebooks specifically purchased (put PURCHASED after the OCLC number), and then deleting the rest — something like 180,000 or so. A number of these were actually duplicates (don’t ask how many — I have no way to tell except when I run across them).

Then I began reloading the ebscohost from scratch.  First batch: 50,000 records.

Wish that had been simpler.  Taught me the complications, however.  Many overlaid our older (owned) netlibrary ones, for example.  Yes, ebscohost owns netlibrary now, but many of the old links still work, and I need to keep those purchased ones separate.  So, fixed that and removed the ebscohost links.  Have to remove all the ebscohost other than that as well, just to be sure.

Found the credo and cambridge and so on that were overlaid and fixed those.  And changed the 001 numbers to add the vendor name after the number to ALL the non-ebscohost ebooks, to try to prevent overlayment.  Vendors other than ebscohost and ebrary, these days, come in small enough amounts that I can deal with them without overlaying records.

Having one OCLC record (ideally) for each ebook vendor, and a separate record for the print editions (even if the ISBNs appear on both), allows the ebooks to be handled separately.  I tried, in early days, combining on the same record.  Big mistake — couldn’t keep up with all the changes.  So I’ve separated them again.  Lots of work both times.

And I mentioned that I have ebrary records again, from a statewide subscription, which fortunately use a different 001 numbering and a ebr prefix.  Unfortunately, it won’t download with my load table, so overlayment is not going to work with them, at present.  I had a suggestion from another librarian (thanks!) who suggested I put that number in the 903 field, and have Innovative add the 903 field to the o index (like the 001 field) so I can overlay additions/deletions to ebrary using the 903.  I added the ebrary ahead of that change, so we’d have some ebooks during the process of working out the ebscohost situation.

So, once I find all (one hopes) the possible wrong overlays, I load the ebscohost from scratch all over yet again.

New idea — since I don’t really need Cutter (092 subfield b) to keep ebooks in order on the shelf, I can change that to the vendor name for ebooks.  That makes it easy to distinguish quickly when I find it in Sierra, such as using CTRL-g to check the index for other copies of a title.

Ended up with a lot fewer duplicates. Still, we now have over 235,000 ebooks and pdf files.

Now I have to add the 690 PROGRAM fields to these all over again. Job security!


Amazon Kindle for Windows 8


The Kindle app software on Windows 8 comes out a bit of a mystery.  Where the heck are the controls?  Nice they don’t intrude, but I need them.

Google search, and I discover that I have to right-click to get them.  That’s apparently press-and-hold in Windows 8, or I can unfold and use the keyboard’s touch pad, or just use a mouse.  Maybe an explanation on that first time I used the app would have been nice.

Upper right corner, and a title bar appears so I can close or minimize, thanks to a Microsoft update on apps.

Display mode

Pages show up in two columns in landscape mode, but make a very nice full page in portrait mode.  I’m very happy with that so far.

I prefer to use the white on black setting to cut down on glare.  Also, the less strong light directed at my eyes late at night may help me to get to sleep more easily, or so some sources claim.

Library display

The library display is, by default, all the covers.  Nice, but there are advantages to a simple list with thumbnail cover displays.  You don’t get that option (or any other) on the Windows 8 Kindle from all I can discover.  Hasn’t Amazon seen even the Adobe Digital Editions — which is not perfect, either, but still ahead of Kindle on this.

Suggestions for Amazon

Amazon is, IMHO, not doing themselves any favors by not making it easier to manipulate their collections.

I need to be able to change to an alternate list format, see what I have already read (VERY important once you get more than a dozen or so books), sort by author or title in each list, group books on both my online and tablet… maybe this is possible on a Kindle device, but limiting it on the Kindle software is making it increasingly awkward to handle my growing collection.  I’m starting to wonder if I should keep my collection here growing, Amazon.  Think about it:

  • Change to list only view, perhaps with smaller thumbnails as one alternative view
  • Be able to sort by authors and series in series order (if I bought the set, which do I read next?)
  • Be able to separate out the ones I’ve read and have those marked (hey, Kindle is supposed to sync all this stuff, so make this possible)
  • Move a few of my collection onto a device (which has limited storage), read them, have it noted that I read them, and then let me move them out to the cloud again.  Right now, the only way to get them out of the way in the cloud is to delete them entirely from my cloud.  For those with limited space on a device (tablet or phone, for example), you may only need a few titles downloaded on the device.  Oh, and they need to be marked as read in the cloud so I don’t bring them back down unless I want to read again.

Kindle is not really functional for heavy readers — the very people it wants to attract.  Kindle has been around long enough that it should have been working this out well before now.



Adding Programs to ebooks

Followup on my post on the Programs tracings in the 690s field.

So, having gone clear through all the bulk purchase of 80,000 plus ebooks from one vendor, we’ve now switched to another vendor for 130,000 plus records, and I’ve had to start over from scratch in adding 690 fields for our various Programs.  All the previous ebooks from that vendor have been removed.

Oh, joy.

Still, I’ve got the technique down pretty well in III’s Millennium software.

1.  Create a list of the new ebooks records.  To keep the searching and file size down for faster operations, I limit it to all the call numbers below a certain one — say, all the call numbers under 300.  I also eliminate all the records that already have a 690.  That way, as I progress, I just increase the upper limit and don’t worry about the lower one.

Oh, yes — when more ebooks are added while I’m working on this long project (as they have been already), they will fall into the proper place, and I’ll catch up with them as I proceed.  If they are in the earlier numbers, they’ll show up at the top of the list.

2.  Run the search.  (This is the part where I switch over and catalog books, write posts like this, check Feedly, or do other chores while waiting.)  It takes a little while.

3.  Sort the records by call number.  Fortunately, this latest vendor uses records downloaded from OCLC, so almost all of them have Dewey call numbers, and the few that have oddball 082 suggested Dewey numbers can quickly be located and fixed.

4.  Display the list.

Okay, there’s tens of thousands of them, at the beginning.  But, starting from the top, I only go so far before the titles are obviously differing in subjects significantly.

As I scan down, I take note of any that might make good book reviews or are simply of interest to me, and add a code in the record for my use.

5.  When the subject changes, highlight from there to the end of the list, and remove those records from the file.  Now I have a much shorter list of like subjects.

6.  Go to Global Update and select the much reduced file.  In order to find it quickly in the Global Update, I use a string of identical characters in the list title, such as @@@@@@ so I can spot the file quickly as I slide down them.

7. Insert one or more appropriate 690 PROGRAM fields and update the batch.

7.  Go back to step one, search by the same search I did before, and this time, the search will eliminate all the ones I just updated.  Repeat.  When I finally run out of records in this chunk, I can increase the call number limit by enough to get a good chunk again.

Now, this has obvious limitations.  For one thing, a lot of titles are not all that descriptive, so I have to check the records for those, which takes time.  Fortunately, it isn’t necessary in most cases.

Also, given the nature of catalogers, institutions, and cataloging, a number of titles end up in call numbers which I would not have chosen, either due to the oddities of the cataloging system (whether Dewey or LC), or just because the cataloger had different priorities from us (see how politely I put that?).  After all, someplace without, say, an Education program might not catalog a given book in that call number range (370s) if it dealt with, say, Psychology as well.  It might end up in the 150s rather than the 370s on that campus, while we would put it in the 370s if it looked more useful there.  Do I bother to reclassify it?  Usually no; I just add a 690 for PROGRAMEDUCATION and that takes care of it.  And probably a PROGRAMPSYCHOLOGY as well, since I can have as many relevant 690s on a record as I need.

People are using key words today, not call numbers.  I’ve talked to catalogers from other campuses who don’t even bother with call numbers for ebooks, since call numbers were intended to group like subjects together on shelves, and ebooks don’t need shelves.  Call numbers may not be furnished with some ebooks records (and I have to pull those out and determine the 690s individually).

So in between everything else I do, I slowly whittle down the ebooks list.  If we have a Program with an accreditation coming up soon, I’ll jump to the relevant ebooks and do them ahead of time, so they’ll count properly as being available in our collection.  With this system, they will not be picked up with the rest when I reach their range later since they have 690s already, so it’s not especially disruptive to the entire process.

The one factor that such bulk purchases of ebooks does complicate is the frequent request for spending info.  How much did we spend on Program A last fiscal year?  Aside from a specialized ebook collection (say, Business, which we have), I can’t really do anything for the general “academic collection” of ebooks in that formula without a HUGE investment in trying to assign print prices to ebooks (which wouldn’t be accurate, since that’s not what we really paid), or giving a figure of $0.002 or something (a percentage of the total annual fees) for the cost of a title, which wouldn’t add enough to the expenditure to be worth the effort.  So, ebooks tend to count in numbers but not in money spent, which is not fair, but that’s the situation.

On the other hand, some students cannot/will not use ebooks, so should that factor into the utility of the funds spent?  That’s not covered by accreditation teams, as far as I’ve been told — if you have the title in some format, it counts.  I supposed you could look at it much like having someone who cannot read print but we don’t have the title in an audiobook.  If there is demand (or likely to be), we’ll probably buy a print version also, as the budget permits.  Audiobook versions, on the other hand, are not always available for many academic titles, and you may or may not be able to use an ereader to read the text aloud to you (publisher’s choice!).

This is going to be a months-long process.  I consider it job security.


Comparing ebook collections

We more than doubled the size of our collection by subscribing to a (if you’ll pardon the term) “bulk” collection from an ebook vendor a couple of years ago.  It’s a cost-effective way to get access for a lot of material which will be available even when the building is closed or some distance from the user.  Over 70,000 records to start, at the time of comparison, up to 88, 907.  Let’s call that Collection A.

These “bulk” purchases come in various packages of differing sizes, and can either specialize in subject areas, or be — as we selected — an “Academic” collection, which may or may not hold some or all of the specialized collections.

We’re not above looking at alternatives, however, so we’ve done some comparisons with an offering from another vendor, where the Academic collection is presently shown as 119,757 (based on a spreadsheet download of titles at the time of comparison).  Let’s call that Collection (wait for it…) “B.”

So, how do we evaluate such a massive amount of records? And, how do we do it as fairly as possible?  Given the sizes, numbers are going to have to suffice for most purposes, which is not my preferred standard, but you work with what you can manage.


I started with date ranges.  Bring them into spreadsheets and sort by publication date.  This isn’t as accurate for comparisons as it might be, for several reasons.  For one thing, I have updates on Collection A received over the months since we got it.  Is it fair to compare it exactly for the current year ( part way through 2013) when the B spreadsheet may not include any titles updated since the spreadsheet was created for marketing?  I’d prefer to see 2013 but not weight it as much as I might.

Also — and this is much harder, probably impossible, to compensate for — a number of ebooks tend to be shown with the ebook publication date, even though the original print date is years earlier, such as a classic work that has just become an ebook.  This tends to flex the ebook records to appear more recent than the contents actually are.  Add to that the fact that some publishers don’t come out with the ebook until after the print, and the dates are skewed further.  I can’t really do much about this, given the large numbers I’m dealing with.

Dates are also tricky in that many records have odd punctuation, “c” for copyright, brackets, and quite a few multiple dates, but this shows up only in Collection A.  Does that mean Collection B has more accurate dates, or just more of them updated to the ebook edition regardless of the original print edition?  I suspect the latter, given a few of the titles which I can tell are older but show 2013.

I do go back several years, to 2008, and compare the numbers.  Since I’ve got two different base sizes, however, I then figure by percentages, to keep it a bit more fair.

Collection A consists of 4,8% titles from 2012.  Collection B consists of 3.4%.

Collection A consists of 7.15% titles from 2011, Collection B has 7.12%.

Collection A consists of 10.6% titles from 2010, Collection B has 9.44%.

Collection A consists of 8.95% titles from 2009, Collection B has 9.04%.

Collection A consists of 8.29% titles from 2008, Collection B has 7.63%.

Okay, let’s jump to a range:

Collection A consists of 88.33% titles from 2000 or later, Collection B has 77%.

So, Collection B has more titles but seems to have a close percentage based on percentages of the collection, until we get to 2000 or later where it falls behind.  How significant is that, given that B’s percentage is from a much larger collection?  Maybe that 11% difference is a smaller factor than it may appear, but it should be kept in mind, at least until some more comparisons can be made.  Going by the other percentages, however, the collections aren’t far apart  in recent years.

Now, a really in depth comparison could take the collections and do this again by sorting first by call number, and then by date, to see how the collections stack up by subject area.  However, this is (a) more time-consuming, and (b) not necessarily as useful as might be expected.

Publishers are weird about ebooks.  You can have key publishers in certain subjects refuse to be part of a third-party collections because (a) they don’t do ebooks, or (b) they do ebooks on their own, or (c) they have ebooks through this vendor but not in the collections — only individually, or (d) it’s just the wrong POTM (Phase Of The Moon).  So collections which look skimpy in a particular subject might just be caught by some of these factors, and/or by when the publisher(s) signed the contract (as in, before or after the new owners came in and made changes not already limited by that contract).  So, I’m going to pass on trying to do anything such as that.

Choice Titles

Now, instead of just calculations, comes the Choice Outstanding Academic Titles lists.  My director thought this would be a good standard to apply, and it’s a very convenient one at that.

I used the three previous years: 2010, 2011, and 2012.  I sorted by title (to randomize the subjects covered a bit), and downloaded the files to spreadsheets (and I’m very glad they set up that feature on Choice).

I have to play with the titles a bit — remove the quotes, leading articles, like that.  The download is NOT alphabetized by title at this point, so next I do that.  Then I remove the unneeded columns, and add two columns on the far left for the A and B collections.

Now I go through the first 100 titles of each year (total 300 titles) and compare them to the holdings in the two collections.  That might not seem like a large sample, but the results were so consistent, I decided it was sufficient.  This is time-consuming, but I quickly realized just a sample would do.

The holdings for each year were exactly identical for both collections.  Same titles held, same number of titles in each 100.

Again, this is probably largely to do with publishers.  Some of them don’t want to have ebooks or want to keep them away from the large vendors; some might sell through these vendors but the licensing or pricing make the titles cost-inefficient to put into bulk collections such as these.  So, both vendors are working from pretty much the same pool of titles, I suspect (for academic titles, anyway).

There’s also the fact that both vendors almost certainly have access to Choice and expect librarians to use that as a standard, so they are likely to focus on getting what they can of those OAT titles for their Academic collections.

User Survey

Now, I did a little informal survey of students who were going through our promotional event in Fall 2013.  I had them look at a screen shot of the same title in both A and B versions, and asked which they preferred, if they had a preference.

Caveat: one of the reference librarians preferred A, and I leaned to B, on this one.  We’re both flexible, however, so left it up to the students.

After the results were tallied, B came out ahead by a considerable margin with students, although A still had some adherents.  There were details such as having controls that did, or did not, scroll off the page, and the use of non-obvious controls for tasks.  Ebooks are so new that controls are not all that standardized, but some actions need to be obvious, and stay on the screen if the entire page is not visible.

Factors, therefore, come out fairly close, although B might have more older materials which give it a larger count.

We use Adobe Digital Editions for Windows/Macs, and suggest Bluefire for use with Android and iOS since the apps are pretty much the same (and therefore easy to help people with).

Internal Factors

Okay, let’s consider internal factors such as upkeep and price.

Collection B is from the same vendor who handles our Discovery service.  A, however, is not, and therefore we have to do more upkeep on sending files to the Discovery service vendor.  There’s staff time on that, including changing settings on the deleted ebooks before sending those records, which varies from month to month.  I have to download records to the catalog in either case, but B relieves the chores of updating the Discovery service.

Price: B is cheaper, by enough to buy at least one or more other databases.  (We have the A invoice and a quote for collection B.)  These days, that’s a humongous factor, although it can be outweighed or at least balanced with enough on the other side.


As of spring 2014, the final decision by the director, considering all these factors, is to switch to Collection B.

This is not without consequences.  Due to Collection A expiring at the end of March, we’re changing in mid-semester.  Students who thought they had an ebook for their paper may find it missing later, for example.

I’ll have to take all the old ebooks from Collection A out of the catalog, and update the Discovery service, as well as OCLC WorldShare, on our holdings.


Also, I do a modification of bibliographic records in our catalog for our Programs page, which allows faculty to find all the materials in their subjects, as well as allowing me to keep track of materials for accreditation purposes.  (I’ve discussed this in another post.)  Now I’ll have to start from scratch with a huge batch of new records, adding 690 fields.  But, I knew the job was dangerous when I took it, Fred (as Super Chicken used to say).

[Please note that this is not to be considered an endorsement for any purposes, and no reimbursement was received in any form, including discounts, for this post.  That’s why I didn’t name vendors.  YMMV — Your Mileage May Vary]