BIBFRAME instance mining: Toward authoritative publisher entities using association rules

Loading...
Thumbnail Image

Degree type

Discipline

Library and Information Science

Subject

BIBFRAME
FP-growth
data mining

Funder

Grant number

Copyright date

2021

Distributor

Related resources

Author

Contributor

Abstract

With the transition of a shared catalog to BIBFRAME linked data, there is now a pressing need for identifying the canonical Instance for clustering in BIBFRAME. A fundamental component of Instance identification is by way of authoritative publisher entities. Previous work in this area by OCLC research (Connaway & Dickey, 2011) proposed a data mining approach for developing an experimental Publisher Name Authority File (PNAF). The OCLC research was able to create profiles for "high-incidence" publishers after data mining and clustering of publishers. As a component of PNAF, Connaway & Dickney were able to provide detailed subject analysis of publishers. This presentation will detail a case study of machine learning methods over a corpus of subjects, main entries, and added entries, as antecedents into association rules to derive consequent publisher entities. The departure point for the present research into identification of authoritative publisher entities is to focus on clustering, reconciliation and re-use of ISBN and subfield b of MARC 260 along with the subjects (650 - Subject Added Entry), main entries (1XX - Main Entries) and added entries (710 - Added Entry-Corporate Name) as signals to inform a training corpus into association rule mining, among other machine learning algorithms, libraries, and methods.

Advisor

Date of presentation

2020-11-25

Conference name

SWIB20 / Semantic Web in Libraries

Conference dates

2020

Conference location

online

Date Range for Data Collection (Start Date)

Date Range for Data Collection (End Date)

Digital Object Identifier

Series name and number

Volume number

Issue number

Publisher

Publisher DOI

relationships.isJournalIssueOf

Comments

Recommended citation

Jim Hahn. "BIBFRAME instance mining: Toward authoritative publisher entities using association rules" SWIB20 / Semantic Web in Libraries

Collection