RDA conversion

Beginning November 2018 into 2019:

I have been charged with converting our catalog to RDA format records. This includes updating our authority records, which have not had a comprehensive update since 1999.

We have bids from two vendors, and have selected one of them. They will convert any records not in current RDA format to that, and also go through our authority records to cull and update them.

Since I haven’t found a step-by-step for this, it looks like I’m going to be creating one: how to convert your III Sierra catalog to RDA format – how I did it good. (I hope good, anyway.)

The vendor has a detailed manual for their part of the process, including the decisions to be made. This is very helpful. However, I don’t have something comparable for Sierra, so I’ve had to figure some things out for myself.

For starters, I need to have Innovative update our RDA via a Review current configuration for RDA. We’ve done this before a few years ago, but can do this again to be up-to-date. There’s a list of operations III will do as part of this:

  • confirm existing load tables will not exclude RDA data fields.
  • install the most current MARC validity tables and special table fields.
  • since we have Webpac Pro, we have some of the fields already set to display in webpub.def
  • index and adjust the RDA fields.

I’ve also gotten in touch, and will go further, with colleagues at another academic library in the state to ask for advice.

The vendor has a pretty complete manual, covering the options and defaults.

Catalog Record Numbers

There are obviously two main phases in the process, at least at our end.

  1. Export bibliographic and authority records to the vendor.
    1. This is actually two parts: a sample to check the configuration
    2. The actual files to convert
  2. Import the completed records to our catalog.

That means that we need to have the catalog (as opposed to OCLC or ISBN or otherwise) record number to line up the exact record to overlay. We may have multiple records for OCLC or ISBN (more than one copy, for each ebook vendor, for example).

Since we already export bibliographic records for use in our Discovery service, we have a format for that, which ends in

87 907 |a.b17555279@  (the 87 being the line number, which varies)

but the 907 field has the a subfield with the period b catalog record number.

Below that is another line if there is an item record.

88 945 |cSeason 1|g1|i1000124651|j0|lrrdsc|nContains 3 DVD discs|op|p$24.60|qc|r |s- |t201|u2|v1|w0|x2|y.i11129839|z161019@

The 945 field will not be needed for this purpose. The vendor can just strip that off before returning.

However, the 907 catalog record number will have to be put into a 949 field for importing and overlay in our catalog. Then it will work just like downloading records from, say, OCLC or ebook vendors using the Data Exchange. While we normally don’t use the overlay for catalog record numbers, it is available in the load table and tests out properly.

Authority records, however, have not been exported from the catalog before this project. It can be done, but there is no catalog record number included. We’ll need to have a format created for those which has that in a 907 or other required field. Again, we normally don’t use that in the 949 to download from OCLC, but the specific record number option is available for authority records and tests out.

And now I have that, just done by III:

37 907 |a.a10613043|b181022|c170405|d-|e2|f-@   the line number 37 is irrelevant. The “a10613043” is our catalog number, and using that in a 949 to import should overlay it on the existing correct number.

So pull the subfield a from the 907 in both types of records and use the ov=[period record number] to overlay.

The resulting line for bib records looks like this in MARCEdit:

=949 \\$a*recs=b;ov=.b10000057;

Getting started

So, we had a chat with the vendor (conference call) about how this is going to be handled. I had filled out the forms for RDA and authority work, and we went over them and made some tweaks.

They recommend deleting all the authority records and replacing them with the fresh set provided by the vendor. So, overlays are not needed; forget the 907 with the record number. Makes me a bit nervous, but apparently this is standard procedure. I included the locally-created authority records we have in the sample file I sent.

A sample file of bib records and one of authority records is at the vendor, and once we get something back, we can check compatibility.

I expect to just look at the authority records returned, and hold off anything more radical until we have a full set back, since that will eventually be a full replacement.

I asked for records in batches of ten thousand, so I can do these in small chunks to avoid overworking our hosted Sierra system.

Sample files returned

The 949 fields were not included, so I cannot overlay the bib records.  However, I did spot checks and comparisons.

I’ve decided to have the RDA Enriched designation moved to the 946, so the old item record info in our pre-2000 era records will not conflict in the 945.

We had the 440 fields changed to 490 and I had that re-indexed in the catalog already, so those will work.

The |h subfield in the 245 fields is excluded nowadays, so that is copied to a 500 note and removed from the 245. That was a step we had considered handling ourselves later, but we left it in to start so it could help with creating the 3xx fields. This way, we still have it if needed but it complies with RDA conventions.

We’ve used the 541 field for ordering info, and since it wasn’t needed when we updated our discovery service, I hadn’t paid much attention to the fact that it wasn’t included in the export file. That doesn’t matter, as I’ve tested, so not to worry about it. It can just sit in the original record.

Odd little detail – since the $ sign is used in MARCEdit as a subfield delimiter, any other place it appears has been replaced with {dollar} so I guess I’ll have to edit that after it goes into Sierra, if it turns out to be important to do so.

The Authority reports are interesting but not alarming — many of the change lists are quite short.

However, the Status 1 (Local) and 2 (Created) did not come back. I may have to retain those in the catalog and just delete everything else. We have some authorities which we keep here for local use which I’d like to retain which will not show up in LCSH.

I’m not professionally happy about the long-standing convention of not using |vBiography after subject names, to indicate clearly what the relationship of the work is to the person, but call me old-fashioned. I guess I can learn to put up with it.

Also, the format info such as |vVideo recordings has been removed from subjects.

IMHO, removing the media designations from the subjects and the 245 is based on the idea that users are sufficiently sophisticated that if they want to include/exclude types of media, they will use the options provided. However, I notice this is being expected from users — the same users who are reluctant to bother looking beyond the first page of results which they get from — all too often — very minimal search terms and no media limitations. I’m not seeing a great deal of sophistication, frankly, from most of our users. Putting more info out there in the list results seems more likely to have it noticed, as opposed to sidebars with the limits to check. But again, I’m old-fashioned, in some opinions.

Overlaying samples

I normally have some protections set for overlaying records. For example, if ebooks overlay, I don’t want to have them mess up the changes already made in the existing record.

But now, I can’t do that. I have to have complete overlayment of the bib records.

In telnet this is already set:

  • A > A > S > O > D Database maintenance >
  • 23 Edit overlay protection list >
  • 2 Bibliographic record from tape (since these go through like tape records) and
  • 6 OCLC Bibliographic record (since these go out of Data Transfer like OCLC records)

We normally protect:

* o 001 OCLC NUMBER (supposed to –but may not — protect OCLC number; OCLC field can duplicate) as some ebook vendors (Credo, for example) put something else in this field.

* i 020 ISBN (protects the order, as we prefer the ten-digit version listed first)

* CALL NUMBER 092 (protects the call number field, as we may have changed it)

* r 300 pagination (pagination may be overlaid by ebook info)

* NOTE 500-599 (protects our notes)

* SUBJECT 600-691 (protects our Subject changes and PROGRAM field)

* MARC 856 (protects changes to the 856)

Indicators are .. and code is n.

Now, however, I have to take all that out long enough to do this RDA change, at least.

Short Sample File

Now I get really cautious. I take the 1,000 record sample file and reduce it to the first few records for a short file. Not too many to fix, if needed. Then I skip down to the ebooks, which are mostly newer records, and take another few records from there. Now I have ten records, older and newer, to check.

And the result of the short sample test is (drum roll):

  • The call number 092 did not overlay because we added a |e subfield for dates, etc. So now we have two call numbers, since the vendor removed the |e subfield and therefore they didn’t match exactly. I can resolve that by having the vendor omit the call number entirely and leave the original.
  • It also creates some notes conflicts:

504 “Bibliographical notes”: pages 735-744. Bibliography: pages 745-754.

is created from

504 “Bibliographical notes”: p. 735-744. Bibliography: p. 745-754.

The difference is that the p. abbreviation is changed to “pages” and so now they don’t overlay. I can’t remove the duplicate easily, because the line no longer matches exactly. I need to have the original removed and keep the changed version. Might be easier to just leave out the 504.

  • Also, I have other notes which duplicate exactly but unnecessarily:

n 500 Translation of Israël et le refus arabe.

n 500 Translation of Israël et le refus arabe.

And it’s not the umlaut since that appears in both. Need to just leave these out so the original field is not affected or duplicated.

  • Our 583, 590, 690 and the 856 and such are duplicated.  Easier if I tell the vendor to omit those in the overlay record and just leave the originals – those are staff-only fields not visible in the OPAC anyway and not affected by RDA changes.
  • I asked to have the 020 ISBNs reordered so the ten-digit one is listed first. The vendor had this request before, but I was apparently the first one to explain that this allowed image linking to the book cover. However, we ended up with a lot of duplication in the ISBNs, especially with ebooks. Need to dedupe these and the OCLC numbers.

Basically, I need to omit the fields to keep unchanged, so only the changed fields are included.

Second sample file

With the fields above removed, the second sample file went well with some test records.

I did notice that by putting the ten-digit ISBNs up top, I got a slightly odd result. I noticed this some time ago, but hadn’t taken any action as yet.

In the 020 fields, subfield c is described as “terms of availability” while subfield q is described as “qualifying information”. Various catalogers of ebooks have used these in the same manner: |c or q (ebk.) for example.

That means that I have two copies of the same ten-digit ISBNs at the top of the 020 fields in a number of records, but because the subfield type differs, Sierra cannot see them as duplicates. By putting them both at the top, however, I have unintentionally emphasized the situation.

I suppose I can do a Global Update and change all the |c to |q, say, and then deduplicate the fields. But, it’s not terribly urgent.

Otherwise, it looks good and I have submitted the bib records and authority records to the vendor for processing just before the holidays.

First major bib file – January 7, 2019

Well, the vendor returned everything in one big file, so had to ask for it split into the 10,000 record chunks.

Suspicious as usual, I took samples out of the chunks, and did a sample test batch.

Then I went looking, and found something I’d missed on the earlier test by focusing on the text body. There’s always something else…

The Location, BType and BStatus fields are being overlaid by the template. Since I have mixed locations and formats, this blows all the bib record fixed fields for these out of the water, since the template will screw up some of them no matter what.

Now, I can set the Location in the template to none so it does not overlay from the template. But the BType (fixed field 30) and BStatus (fixed field 31) are mandatory, and I cannot create a blank option.

HOWEVER – what I can do is set overlay protection on Location, BType, and BStatus fields so they do not overlay. So I did that. Tested and works properly.

And next… the 946 field. Turns out, it is not in our load template. So it doesn’t show up with the RDA notice in it in the catalog records. Back to III to have 946 added to our load profile.

First 10,000

And now that’s out of the way…

The first 10,000 failed to load completely. Got one of those weird “zzzzz” location code errors and it shut down partway.

Supportal to III asking about it. The reply was:

The “zzzzz” location code comes from the HOLDING SYMBOLS table, it is a failsafe location in case your incoming records do not have a valid symbol in the 049 field.  Some libraries do not need/use it as a warning — so I have removed that setting for you at this time.

The number of errors about the 001 field?  The errors are triggered only if you have an invalid OCLC number,  I have also increased the maximum for you.  (BTW: your current file: 082417.dat.lmarc — does not have any errors, because the records in that file have valid OCLC numbers, and valid 049 values.)”

Since the Holding Symbols failsafe has not — to my knowledge — been a problem except in generating these errors, I’m satisfied with removing that.

In a followup (when the 001 error blew up the next attempt to load) I was told:

Please note that the 001 field for OCLC record is consider invalid, because it does not have an OCLC prefix.

If you do not want to have these 001 errors reported, we can turn OFF the validity check at 001; please note that your load tables are set to validate OCLC number, where it checks for whether the 001 field does not begin with “ocn”, “ocl7”, or “ocm”, “ocn”, “on” ..  turning this validation OFF means any 001 field will be considered valid and accepted.  Many libraries opt for this because non-OCLC records/materials may have any format in the 001 field (such as EB12345, or SSL12345).

I’m happy to turn this off, so decided to do that. It just chokes too easily, and not just on these records — I’ve had a number of others with this problem. Records come from a number of vendors who don’t always prefix with “ocn” or another option, and it’s not a useful failsafe for us.

Got that turned off, and ran the file again. It only did the ones it missed before.

Our 049 fields had a code that Sierra wasn’t expecting: ASZW, which we had created for our Pebley Center (historical/local research) some years back. We had that on about 70 older records, back when we still made a point about the 049 field having separate codes from our usual.  So some of those turned up as errors, but not enough to kill the job. However, since that doesn’t really matter these days, I did a Global and changed all of those to a conventional code so there should be no future problems there.

I also got an invalid location code on one record, but the problem seems to be a stray indicator (049  1 instead of just 049) so no big deal.

Next 10,000

I added a “batch 01” to the end of the 946 fields of the first batch, so I could differentiate them from the next batch.

That’s because I also had to deduplicate the i 020 ISBN and o 001 OCLC number fields for the batch. I don’t want to do an ever-increasing group dedupe each time I add a batch, and doing huge batches is time/processor power-consuming. Doing batch by batch is fast enough.

I continue doing this for each batch. Since the new batch doesn’t have the word “batch” in the 946, it’s easy to pull them into a list. That will also allow me to pull them up in smaller batches of 10,000 later if necessary.

I do some spot checking, but at this stage, each batch seems pretty solid. These earlier ones are the older records, however. The later records tend to be more complex, so we’ll see about those when I reach them.

Had one title with errors – we had withdrawn the title so nothing to overlay. Not a problem.

Freezing on a load

On the 9th batch, Sierra choked (over 9,27x records out of 10,000 done). Only 2 errors and neither serious; apparently it just had a bad moment. Kill Sierra, and start over. Import the file again, Z process it, and try again. Nope, choked again.

Supportal recommended:


When loading records in Data Exchange, it freezes in the middle of the load


Go to Data Exchange and view the file. You will see the “start block” is at a number other than 1. The system is having a problem with loading the record at that block. What you can do is view the file in Data Exchange, “jump to” the block in question, then scroll to the next record. The software will tell you the Start Block number for that record (other words, skip over loading the record at the problem block). When you restart the load, change the Start Block to the Start Block number of the next record in the file. You may also want to look at the record causing the problem and manually recreate it.

So, killed Sierra, restarted, converted the file over again, found the problem record, and printed it. Then changed the start block to the next record after the one where it stopped. and completed it.

Errlog says:

1 > 61348 Invalid block in MARC file
2 > 61349 Invalid block in MARC file
3 > 61350 Invalid block in MARC file
4 > 61351 Invalid block in MARC file
5 > 61352 Invalid block in MARC file
6 > 61353 m2tab global or special routine name is unknown: %format( %s)%replace(“-“,”0”)

So 61347 through 61353 records are a problem. I printed them and cannot see a reason for the “invalid block” but the ones beyond these when through without a hitch. File completed. Oddly enough, I still got 10,000 records in the Create Lists to check, so the invalid block didn’t stop them from being overlaid. I checked those and they seem to have been overlaid properly.

So now our first batch of non-subscription ebooks and other files is done.

After I take care of some other delayed matters, I will work on the authority records, which means testing a download, then deleting the old ones and downloading the new ones.

Authority Records

So today (2019.1.22) I delete the existing authority records in the catalog (over 60,000 of them) and replace them with the ones from the vendor, as recommended by the vendor.

Of course, the file of over 60,000 choked and I had to close Sierra.

Fine, get a smaller file and do it 25,000 records at a time.

I’m ignoring the GENRE and TITLE authority records at this point. They aren’t that important, from our point of view.

Sierra has specific import functions for Name and Subject records, so I converted all the files of each into proper format in their proper functions, and have them ready to go in ASAP after the deletions are completed.











RDA thoughts

[updated 2017.11.9]RDAlogo_rgb

I’ve been listening to webinars on RDA (Resource Description and Access) and related developments in cataloging.  Most recently, the Amigos webinar “Is RDA on Your RaDAr?”, which has been interesting.  Thank you, Amigos team.

It’s my responsibility, after all, as the Technical Services Librarian.  I do all the cataloging.  I have one person to follow up after all that (a much more time-consuming job, and I’m pathetically glad I have her) with the barcodes, physical processing, etc., much of it by using the tools I worked out for that (and happily delegated). Then we have a student worker to cover/place labels/pockets/etc.

I’ve been doing this long enough to have gone through some changes in AACR to AACR2, MARC formats, etc.  That includes the first records for things that exist online, even.  So I’m no stranger to changes.

However, that tends to make me a little slow to jump onto new things just because they are supposed to be improvements, at least while still in beta stages.  RDA has been out a while, but it was being tweaked for some time.

I don’t want to sound negative about evaluating newer ideas such as RDA, but I’m a big advocate of cost-effectiveness.  This is sometimes considered inconsistent with the obsessive-compulsive, sometimes perfectionist, nature of catalogers (being personally somewhat guilty as charged). When there’s nobody else to take up the slack in the workload, however, one tends to boil down to what’s really going to get used.  The perfect should not be the enemy of the “good enough for actual use by our end users.”  So, that’s how I tend to look at something like this.

I specifically asked a presenter what advantages RDA offers that, say, a Discovery service doesn’t as far as searching and limiting and other patron-relevant functions.  I did not get one specific improvement, just some talk about how it supposedly was going to be better, and Discovery services are using it or compensating for the lack of RDA, or whatever.  

We have a Discovery service now on top of the catalog, so will RDA improve on that?  What can I use to justify the work and expense of converting our existing catalog records?  That’s the sort of question I have to ask.

I’m seeing my own catalog, which would need revisions to essentially every single record to become RDA, at some cost (going through a vendor to reprocess the records, as others have done, I hear).

I have some idea of what RDA advocates are trying to improve.  MARC, admittedly, is rigid, because the standardization of MARC allows computers to search and display the data consistently.  There are coding conventions (including AACR2) that still contain holdovers from the limited amount of data which could be crammed onto paper catalog cards, such as abbreviations which could be upgraded to make records clearer to end users.  That’s a pro, in favor of RDA procedures.

And yet I saw a commenter in a webinar session saying reactions from users about the missing GMD (General Material Designation) in the 245 field (the h subfield with things such as [sound recording] to describe the material RIGHT THERE) which patrons are complaining are missing in RDA records.  Patrons now say they have to open the entire record to see what the item actually is (since apparently they are not using the icons representing format supposedly being displayed using these fields).  Our catalog and Discovery service both offer visual icons as well as GMDs, since some people will look one place and not the other.

Other librarians say the end users haven’t seemed to notice (so why are we doing this RDA stuff, again?).

The reality, I fear, is that the vast majority of patrons/end users may not use — or even care about — the features that RDA and related standards are trying to provide.  I need those supposed enhancements to be able to justify spending the money and time to non-library as well as library administration, during a time of increasingly limited funding, or we’re not getting approved for this.

So, I’m trying to line up the factors that I (not necessarily anyone else) am thinking about at this time, which may change as we progress, and consider what — if any — reason(s) might be valid for putting effort into converting to RDA.

*** There are certainly pros to RDA, such as getting away from the abbreviations used since the days of typing catalog cards, and allowing more flexibility in tracings and description.  However, I’m not seeing why elements such as those cannot be added to our existing records (other than consistency), whether or not RDA is implemented otherwise.  We are definitely overdue for some of these changes.  I could do a lot of that with just the Global Update function in III’s Millennium, however, if it seems useful enough. [No Reason]

*** Don’t we love the 007 and 008 fields?  I wonder how many lone professional-cataloger libraries were involved in creating RDA and/or implementing it…  Of course, that’s not a pro or con; a lot of stuff gets created by the members of larger staffs which benefit — or at least affect — the rest of us.  I certainly didn’t have time to sit down and write AACR2 anyplace I’ve worked.  Several presenters (in this webinar and others) talk about the meetings in committees and elsewhere, over details such as how many tracings to do, while us solo types hold such meetings in our heads.  Meanwhile, I admit I pretty much never bother with 007/008 fields on the rare occasions when I do original cataloging — anyone else who wants them is welcome to add to my record. [Reason Only Useful to Catalogers = Con]

*** I hate implementing anything until it’s completed, so I haven’t rushed to do RDA up to this point (2009 to 2014). I think it’s pretty much out there by now, with some tweaks in process.  So it’s only now that I feel I can seriously look at it, and at how it’s working for places that are using it. [Reason to Consider]

*** BIBFRAME is proposed to replace MARC coding, to allow more links and versatility in handling them.  Someday.  For open source software, perhaps.  It looks promising, but it’s not something I can use at this time. [Reason to Wait]

We have been discussing changing our ILS at some time in the future, and if we decide to do that, we may just have much of this handled automatically by using enhanced records.  [Reason to Wait]

I’ve set our catalog  so it accepts and displays RDA records.  We can use the incoming records downloaded from OCLC.  I keep the RDA fields in records since they don’t display or provide any relevance in the online catalog which users see.  [No Reason]

All in all, I cannot see that RDA helps us, or that not having it hurts us.  At least at this time.  Maybe later.  Wait and see.

Update 2017.11.8

We have allowed the webpub.def in our Sierra OPAC (online public access catalog) to show RDA fields such as 264, since RDA-compatible records show that field instead of 260 for a while now.

I’m seeing another Amigos presentation on RDA and such. With all due respect, I’m not seeing anything that will make a bit of difference to patrons trying to use the OPAC. Still a lot of stuff being developed. Still — quite honestly — arcane.

ARCANE: As in, “known or knowable to only a few people”, as in “mysterious, obscure”. But hey, the idea is still to keep understanding to catalogers, and that function is working quite well. Top marks.

Extra credit for terms such as “the four-fold path” (didn’t Dr. Strange come up with that one?) complete with a graphic and a need for a new ‘toolkit’ to figure out how to apply it. They are now restructuring RDA and the toolkit. Paint your wand a different color, Mr. Potter.  [Everyone following the cultural references okay?]

This is looking more and more like change for the sake of appearing to ‘update’ procedures, which quickly becomes a new way to complicate whatever you’re doing. I don’t doubt the dedication of the people working on this, but any presentation that has to quote the Hitchhiker’s Guide and say “Don’t Panic” at the end, does not fill me with confidence. No panic here – just minimal compliance to remain functional, until they finish digging all the way through the planet and come out the other side.  (“The elves are working hard” as they said.)