RDA conversion

Beginning November 2018 into 2019:

I have been charged with converting our catalog to RDA format records. This includes updating our authority records, which have not had a comprehensive update since 1999.

We have bids from two vendors, and have selected one of them. They will convert any records not in current RDA format to that, and also go through our authority records to cull and update them.

Since I haven’t found a step-by-step for this, it looks like I’m going to be creating one: how to convert your III Sierra catalog to RDA format – how I did it good. (I hope good, anyway.)

The vendor has a detailed manual for their part of the process, including the decisions to be made. This is very helpful. However, I don’t have something comparable for Sierra, so I’ve had to figure some things out for myself.

For starters, I need to have Innovative update our RDA via a Review current configuration for RDA. We’ve done this before a few years ago, but can do this again to be up-to-date. There’s a list of operations III will do as part of this:

  • confirm existing load tables will not exclude RDA data fields.
  • install the most current MARC validity tables and special table fields.
  • since we have Webpac Pro, we have some of the fields already set to display in webpub.def
  • index and adjust the RDA fields.

I’ve also gotten in touch, and will go further, with colleagues at another academic library in the state to ask for advice.

The vendor has a pretty complete manual, covering the options and defaults.

Catalog Record Numbers

There are obviously two main phases in the process, at least at our end.

  1. Export bibliographic and authority records to the vendor.
    1. This is actually two parts: a sample to check the configuration
    2. The actual files to convert
  2. Import the completed records to our catalog.

That means that we need to have the catalog (as opposed to OCLC or ISBN or otherwise) record number to line up the exact record to overlay. We may have multiple records for OCLC or ISBN (more than one copy, for each ebook vendor, for example).

Since we already export bibliographic records for use in our Discovery service, we have a format for that, which ends in

87 907 |a.b17555279@  (the 87 being the line number, which varies)

but the 907 field has the a subfield with the period b catalog record number.

Below that is another line if there is an item record.

88 945 |cSeason 1|g1|i1000124651|j0|lrrdsc|nContains 3 DVD discs|op|p$24.60|qc|r |s- |t201|u2|v1|w0|x2|y.i11129839|z161019@

The 945 field will not be needed for this purpose. The vendor can just strip that off before returning.

However, the 907 catalog record number will have to be put into a 949 field for importing and overlay in our catalog. Then it will work just like downloading records from, say, OCLC or ebook vendors using the Data Exchange. While we normally don’t use the overlay for catalog record numbers, it is available in the load table and tests out properly.

Authority records, however, have not been exported from the catalog before this project. It can be done, but there is no catalog record number included. We’ll need to have a format created for those which has that in a 907 or other required field. Again, we normally don’t use that in the 949 to download from OCLC, but the specific record number option is available for authority records and tests out.

And now I have that, just done by III:

37 907 |a.a10613043|b181022|c170405|d-|e2|f-@   the line number 37 is irrelevant. The “a10613043” is our catalog number, and using that in a 949 to import should overlay it on the existing correct number.

So pull the subfield a from the 907 in both types of records and use the ov=[period record number] to overlay.

The resulting line for bib records looks like this in MARCEdit:

=949 \\$a*recs=b;ov=.b10000057;

Getting started

So, we had a chat with the vendor (conference call) about how this is going to be handled. I had filled out the forms for RDA and authority work, and we went over them and made some tweaks.

They recommend deleting all the authority records and replacing them with the fresh set provided by the vendor. So, overlays are not needed; forget the 907 with the record number. Makes me a bit nervous, but apparently this is standard procedure. I included the locally-created authority records we have in the sample file I sent.

A sample file of bib records and one of authority records is at the vendor, and once we get something back, we can check compatibility.

I expect to just look at the authority records returned, and hold off anything more radical until we have a full set back, since that will eventually be a full replacement.

I asked for records in batches of ten thousand, so I can do these in small chunks to avoid overworking our hosted Sierra system.

Sample files returned

The 949 fields were not included, so I cannot overlay the bib records.  However, I did spot checks and comparisons.

I’ve decided to have the RDA Enriched designation moved to the 946, so the old item record info in our pre-2000 era records will not conflict in the 945.

We had the 440 fields changed to 490 and I had that re-indexed in the catalog already, so those will work.

The |h subfield in the 245 fields is excluded nowadays, so that is copied to a 500 note and removed from the 245. That was a step we had considered handling ourselves later, but we left it in to start so it could help with creating the 3xx fields. This way, we still have it if needed but it complies with RDA conventions.

We’ve used the 541 field for ordering info, and since it wasn’t needed when we updated our discovery service, I hadn’t paid much attention to the fact that it wasn’t included in the export file. That doesn’t matter, as I’ve tested, so not to worry about it. It can just sit in the original record.

Odd little detail – since the $ sign is used in MARCEdit as a subfield delimiter, any other place it appears has been replaced with {dollar} so I guess I’ll have to edit that after it goes into Sierra, if it turns out to be important to do so.

The Authority reports are interesting but not alarming — many of the change lists are quite short.

However, the Status 1 (Local) and 2 (Created) did not come back. I may have to retain those in the catalog and just delete everything else. We have some authorities which we keep here for local use which I’d like to retain which will not show up in LCSH.

I’m not professionally happy about the long-standing convention of not using |vBiography after subject names, to indicate clearly what the relationship of the work is to the person, but call me old-fashioned. I guess I can learn to put up with it.

Also, the format info such as |vVideo recordings has been removed from subjects.

IMHO, removing the media designations from the subjects and the 245 is based on the idea that users are sufficiently sophisticated that if they want to include/exclude types of media, they will use the options provided. However, I notice this is being expected from users — the same users who are reluctant to bother looking beyond the first page of results which they get from — all too often — very minimal search terms and no media limitations. I’m not seeing a great deal of sophistication, frankly, from most of our users. Putting more info out there in the list results seems more likely to have it noticed, as opposed to sidebars with the limits to check. But again, I’m old-fashioned, in some opinions.

Overlaying samples

I normally have some protections set for overlaying records. For example, if ebooks overlay, I don’t want to have them mess up the changes already made in the existing record.

But now, I can’t do that. I have to have complete overlayment of the bib records.

In telnet this is already set:

  • A > A > S > O > D Database maintenance >
  • 23 Edit overlay protection list >
  • 2 Bibliographic record from tape (since these go through like tape records) and
  • 6 OCLC Bibliographic record (since these go out of Data Transfer like OCLC records)

We normally protect:

* o 001 OCLC NUMBER (supposed to –but may not — protect OCLC number; OCLC field can duplicate) as some ebook vendors (Credo, for example) put something else in this field.

* i 020 ISBN (protects the order, as we prefer the ten-digit version listed first)

* CALL NUMBER 092 (protects the call number field, as we may have changed it)

* r 300 pagination (pagination may be overlaid by ebook info)

* NOTE 500-599 (protects our notes)

* SUBJECT 600-691 (protects our Subject changes and PROGRAM field)

* MARC 856 (protects changes to the 856)

Indicators are .. and code is n.

Now, however, I have to take all that out long enough to do this RDA change, at least.

Short Sample File

Now I get really cautious. I take the 1,000 record sample file and reduce it to the first few records for a short file. Not too many to fix, if needed. Then I skip down to the ebooks, which are mostly newer records, and take another few records from there. Now I have ten records, older and newer, to check.

And the result of the short sample test is (drum roll):

  • The call number 092 did not overlay because we added a |e subfield for dates, etc. So now we have two call numbers, since the vendor removed the |e subfield and therefore they didn’t match exactly. I can resolve that by having the vendor omit the call number entirely and leave the original.
  • It also creates some notes conflicts:

504 “Bibliographical notes”: pages 735-744. Bibliography: pages 745-754.

is created from

504 “Bibliographical notes”: p. 735-744. Bibliography: p. 745-754.

The difference is that the p. abbreviation is changed to “pages” and so now they don’t overlay. I can’t remove the duplicate easily, because the line no longer matches exactly. I need to have the original removed and keep the changed version. Might be easier to just leave out the 504.

  • Also, I have other notes which duplicate exactly but unnecessarily:

n 500 Translation of Israël et le refus arabe.

n 500 Translation of Israël et le refus arabe.

And it’s not the umlaut since that appears in both. Need to just leave these out so the original field is not affected or duplicated.

  • Our 583, 590, 690 and the 856 and such are duplicated.  Easier if I tell the vendor to omit those in the overlay record and just leave the originals – those are staff-only fields not visible in the OPAC anyway and not affected by RDA changes.
  • I asked to have the 020 ISBNs reordered so the ten-digit one is listed first. The vendor had this request before, but I was apparently the first one to explain that this allowed image linking to the book cover. However, we ended up with a lot of duplication in the ISBNs, especially with ebooks. Need to dedupe these and the OCLC numbers.

Basically, I need to omit the fields to keep unchanged, so only the changed fields are included.

Second sample file

With the fields above removed, the second sample file went well with some test records.

I did notice that by putting the ten-digit ISBNs up top, I got a slightly odd result. I noticed this some time ago, but hadn’t taken any action as yet.

In the 020 fields, subfield c is described as “terms of availability” while subfield q is described as “qualifying information”. Various catalogers of ebooks have used these in the same manner: |c or q (ebk.) for example.

That means that I have two copies of the same ten-digit ISBNs at the top of the 020 fields in a number of records, but because the subfield type differs, Sierra cannot see them as duplicates. By putting them both at the top, however, I have unintentionally emphasized the situation.

I suppose I can do a Global Update and change all the |c to |q, say, and then deduplicate the fields. But, it’s not terribly urgent.

Otherwise, it looks good and I have submitted the bib records and authority records to the vendor for processing just before the holidays.

First major bib file – January 7, 2019

Well, the vendor returned everything in one big file, so had to ask for it split into the 10,000 record chunks.

Suspicious as usual, I took samples out of the chunks, and did a sample test batch.

Then I went looking, and found something I’d missed on the earlier test by focusing on the text body. There’s always something else…

The Location, BType and BStatus fields are being overlaid by the template. Since I have mixed locations and formats, this blows all the bib record fixed fields for these out of the water, since the template will screw up some of them no matter what.

Now, I can set the Location in the template to none so it does not overlay from the template. But the BType (fixed field 30) and BStatus (fixed field 31) are mandatory, and I cannot create a blank option.

HOWEVER – what I can do is set overlay protection on Location, BType, and BStatus fields so they do not overlay. So I did that. Tested and works properly.

And next… the 946 field. Turns out, it is not in our load template. So it doesn’t show up with the RDA notice in it in the catalog records. Back to III to have 946 added to our load profile.

First 10,000

And now that’s out of the way…

The first 10,000 failed to load completely. Got one of those weird “zzzzz” location code errors and it shut down partway.

Supportal to III asking about it. The reply was:

The “zzzzz” location code comes from the HOLDING SYMBOLS table, it is a failsafe location in case your incoming records do not have a valid symbol in the 049 field.  Some libraries do not need/use it as a warning — so I have removed that setting for you at this time.

The number of errors about the 001 field?  The errors are triggered only if you have an invalid OCLC number,  I have also increased the maximum for you.  (BTW: your current file: 082417.dat.lmarc — does not have any errors, because the records in that file have valid OCLC numbers, and valid 049 values.)”

Since the Holding Symbols failsafe has not — to my knowledge — been a problem except in generating these errors, I’m satisfied with removing that.

In a followup (when the 001 error blew up the next attempt to load) I was told:

Please note that the 001 field for OCLC record is consider invalid, because it does not have an OCLC prefix.

If you do not want to have these 001 errors reported, we can turn OFF the validity check at 001; please note that your load tables are set to validate OCLC number, where it checks for whether the 001 field does not begin with “ocn”, “ocl7”, or “ocm”, “ocn”, “on” ..  turning this validation OFF means any 001 field will be considered valid and accepted.  Many libraries opt for this because non-OCLC records/materials may have any format in the 001 field (such as EB12345, or SSL12345).

I’m happy to turn this off, so decided to do that. It just chokes too easily, and not just on these records — I’ve had a number of others with this problem. Records come from a number of vendors who don’t always prefix with “ocn” or another option, and it’s not a useful failsafe for us.

Got that turned off, and ran the file again. It only did the ones it missed before.

Our 049 fields had a code that Sierra wasn’t expecting: ASZW, which we had created for our Pebley Center (historical/local research) some years back. We had that on about 70 older records, back when we still made a point about the 049 field having separate codes from our usual.  So some of those turned up as errors, but not enough to kill the job. However, since that doesn’t really matter these days, I did a Global and changed all of those to a conventional code so there should be no future problems there.

I also got an invalid location code on one record, but the problem seems to be a stray indicator (049  1 instead of just 049) so no big deal.

Next 10,000

I added a “batch 01” to the end of the 946 fields of the first batch, so I could differentiate them from the next batch.

That’s because I also had to deduplicate the i 020 ISBN and o 001 OCLC number fields for the batch. I don’t want to do an ever-increasing group dedupe each time I add a batch, and doing huge batches is time/processor power-consuming. Doing batch by batch is fast enough.

I continue doing this for each batch. Since the new batch doesn’t have the word “batch” in the 946, it’s easy to pull them into a list. That will also allow me to pull them up in smaller batches of 10,000 later if necessary.

I do some spot checking, but at this stage, each batch seems pretty solid. These earlier ones are the older records, however. The later records tend to be more complex, so we’ll see about those when I reach them.

Had one title with errors – we had withdrawn the title so nothing to overlay. Not a problem.

Freezing on a load

On the 9th batch, Sierra choked (over 9,27x records out of 10,000 done). Only 2 errors and neither serious; apparently it just had a bad moment. Kill Sierra, and start over. Import the file again, Z process it, and try again. Nope, choked again.

Supportal recommended:

Problem

When loading records in Data Exchange, it freezes in the middle of the load

Solution

Go to Data Exchange and view the file. You will see the “start block” is at a number other than 1. The system is having a problem with loading the record at that block. What you can do is view the file in Data Exchange, “jump to” the block in question, then scroll to the next record. The software will tell you the Start Block number for that record (other words, skip over loading the record at the problem block). When you restart the load, change the Start Block to the Start Block number of the next record in the file. You may also want to look at the record causing the problem and manually recreate it.

So, killed Sierra, restarted, converted the file over again, found the problem record, and printed it. Then changed the start block to the next record after the one where it stopped. and completed it.

Errlog says:

1 > 61348 Invalid block in MARC file
2 > 61349 Invalid block in MARC file
3 > 61350 Invalid block in MARC file
4 > 61351 Invalid block in MARC file
5 > 61352 Invalid block in MARC file
6 > 61353 m2tab global or special routine name is unknown: %format( %s)%replace(“-“,”0”)

So 61347 through 61353 records are a problem. I printed them and cannot see a reason for the “invalid block” but the ones beyond these when through without a hitch. File completed. Oddly enough, I still got 10,000 records in the Create Lists to check, so the invalid block didn’t stop them from being overlaid. I checked those and they seem to have been overlaid properly.

So now our first batch of non-subscription ebooks and other files is done.

After I take care of some other delayed matters, I will work on the authority records, which means testing a download, then deleting the old ones and downloading the new ones.

Authority Records

So today (2019.1.22) I delete the existing authority records in the catalog (over 60,000 of them) and replace them with the ones from the vendor, as recommended by the vendor.

Of course, the file of over 60,000 choked and I had to close Sierra.

Fine, get a smaller file and do it 25,000 records at a time.

I’m ignoring the GENRE and TITLE authority records at this point. They aren’t that important, from our point of view.

Sierra has specific import functions for Name and Subject records, so I converted all the files of each into proper format in their proper functions, and have them ready to go in ASAP after the deletions are completed.

 

 

 

 

 

 

 

 

 

Advertisements

Date displays in the brief citation lists

I counter-attacked a problem this week that had been bugging me a while.

2date_sample

 

 

 

 

 

The brief citation display we used to have for a number of our records was misleading people who saw dates (on the right, under the book graphic) such as “2006-” which was what displayed for serial book records.  (You don’t see that in the example for the reason explained below.)  People overlooked the hyphen and assumed the date of the title was 2006 instead of realizing that was when the entire series began.  Faculty kept asking us to update titles which were already up-to-date.

We used serial records for control reasons, which made it easier to track and update them.  But the date in the 260 subfield c was what displays in the catalog, and that was the series starting date followed by a hyphen to indicate the series was still continuing.

So, I experimented and proposed to the staff, after some discussion, that we begin putting the date of the latest edition at the end of that subfield.  Example: |c2006-2016  [latest edition owned]

Yes, that is cataloging heresy, but the result is shown above, since the ending date is displayed and the text is ignored.  The latest edition now appears.

We’ll have to update this field every time we add a new edition, but it’s a small price to pay to make it easier for people to understand the catalog.  If you go to the individual record, the  [latest edition owned] is visible so it explains to the lay user what the date really means.

Progress is often incremental.  This week, I incremented.

 

Another day, another icon

We have a few titles in the catalog that are both online databases and available in paper:  Value Line, the Arkansas Code, for examples.

I had a request to come up with a new icon to show in the search results that indicated the materials available, rather than one or the other.

The problem is, we’ve maxed out on material types.  These are in BCode2 and ICode2 in the III system, and since we use ICode2 for inventory and I try to keep these the same as much as possible, I had to hunt for something I could salvage.

Fortunately, we’ve eliminated filmstrips, so I reused that code “p” and made it “Database & Books” instead.

That still left me without a little graphic for the icon to show on the right.  So, I reworked and re-named the same picture from ebooks and made it work for this.

A keyword search on “value line” will show the results.

tabs in the catalog

Working on the catalog display in Innovative (III) today and making these notes to myself.

I originally used the May 2007 set downloaded from III, with some modifications I picked up at IUG, for the version I brought out in August 2007.

The campus standard HAD been to accomodate 800×600 displays, and I didn’t want people using those to go from the campus pages to the library pages and find themselves suddenly running off the side of the screen with the catalog.

Since we are now no longer restricted to the 800×600 display (with the wide margins left and right), I’m reconfiguring the pages for 98% width (just a 1% margin on each side — I think it looks neater and less crowded than 100% across the page) of the screen in 1025×768 resolution.

That leaves a lot of stuff sort of weighing down the left side. All the tabs, for example, start on the left, and don’t go very far across in many cases.

So, I decided I needed to change the width of the tabs to spread them out a bit. The catch is, the tabs normally on mainmenu.html, for example, are using the div class “mainActiveTab” with “menuActiveTab” and InactiveTab, and those have only one size.

However, the div class “helpActiveTab” has “helpActiveTabMedium” and “Large” and seem to work just as well, so I’m replacing “main/menu” with “help” div classes. (It’s easier than adding to the CSS for “main/menu”.)

Then I changed the size to all medium tabs. Spreads further across and balances the look better, IMHO.

I’ve also taken the frame with the Quick Links and moved it over to the far right 20% of the page, and reset the percentages to 80% search and 20% Quick Links frame, now that I have the additional room. More balanced, again.

Tests okay.