Pl4net.info

Bibliothekarische Stimmen. Independent, täglich.

2. Februar 2022
von Joachim Neubert
Kommentare deaktiviert für How-to: Matching multilingual thesaurus concepts with OpenRefine

How-to: Matching multilingual thesaurus concepts with OpenRefine

Currently, the STW Thesaurus for Economics is mapped to Wikidata, one sub-thesaurus at a time. For the next part, "B Business Economics", we have improved our prior OpenRefine matching process. Though the use case - matching concepts in a multilingual ...

2. Februar 2022
von Joachim Neubert
Kommentare deaktiviert für How-to: Matching multilingual thesaurus concepts with OpenRefine

How-to: Matching multilingual thesaurus concepts with OpenRefine

Currently, the STW Thesaurus for Economics is mapped to Wikidata, one sub-thesaurus at a time. For the next part, "B Business Economics", we have improved our prior OpenRefine matching process. Though the use case - matching concepts in a multilingual ...

13. Dezember 2021
von Joachim Neubert
Kommentare deaktiviert für Integrating the PM20 companies archive: part 3 of the data donation to Wikidata

Integrating the PM20 companies archive: part 3 of the data donation to Wikidata

ZBW inherited a large trove of historical company information - annual reports, newspaper clippings and other material about more than 40,000 companies and other organizations around the world. Parts of these, in particular all about German und British entities until 1949, are available free and online in the companies section (list by country) of the 20th Century Press Archives. More digitized folders with material about companies in and outside of Europe up to 1960 are accessible only on ZBW premises, due to intellectual property rights.

As a part of its support for Open Science, ZBW has made all metadata of the 20th Century Press Archives available under a CC0 license. In order to make the folders more easily accessible for business history research as well as for the general public, we have added links for every single folder to Wikidata. In addition to that, the metadata about companies and organizations, such as inception date or links to board members, has been added to the large amount of company data already available in Wikidata. This continues the PM20 data donation of ZBW to Wikidata, as described earlier for the persons archives and the countries/subjects archives. The activities were carried out - with notable help of volunteers - and documented in the WikiProject 20th Century Press Archives.

The mapping process to Wikidata items

Many of the PM20 company and organization folders deal with existing items in Wikidata. If GND identifiers were assigned to these items, we directly created links to PM20 companies with the same id, and were done. Matching and linking to Wikidata items without the help of a unique identifier however provided some challenge. Different from person names, company names change frequently, or are spelled differently in different times or languages. Not too uncommon, the entities themselves change through mergers and acquisitions, and may or may not have been represented by a new folder in PM20, or by a different item in Wikidata. Subsidiaries may be subsumed under the parent organization, or be separate entities. While it is relatively easy to split items in Wikidata, in the folders with printed newspaper clippings and reports it meant digging through sometimes hundreds of pages to single out a company retrospectively. So early decisons about the cutting and delimitation of folders often stuck for the following decades. All of that made it more difficult not only to obtain matches at all, but also to decide if indeed the same entity is covered.

For the first matching approach, we used the Wikidatas Mix-n-match (M-n-m) tool. In order to get manageable "buckets", we sliced the data according to the main language of the company location (German, English, French, Dutch and Other). In the M-n-m batches, we aimed at entities of type "organization" in that language. Despite the fact that we also used the available aliases, we found relatively low matching rates (and a number of "false positives" among them).

After having worked through the M-n-m suggestions, we switched to another approach: For each segment, we created a list of search statements for all folders not already linked in Wikidata. For each entry, the company name was searched via Startpage (which in turn uses Google search), supplemented by "site:wikipedia.org". That searches all Wikipedia pages as full text, so slight differences in the spelling of company names did not matter. Also Wikipedia pages turned up where the company name only occurred in some context, e.g. for the founder of the company or as part of a later merger. We now could select the correct Wikipedia page from the result list, follow the "Wikidata item" link and add the PM20 folder ID to the item. Another search link on the list searched "site:wikidata.org" for existing Wikidata items. (It turned out that for Wikidata, Duck-duck-go brought better results than Startpage/Google.)

When no exactly matching item was found, we sometimes added the PM20 link with a "mapping relation type" of "related match", according to the perceived usefulness for later more detailed work.

The tedious work was facilitated with a second list, which contained statements for creating missing items immediately in Wikidata's QuickStatements  (QS) tool. The statements included labels in different languages, descriptions, sometimes aliases, the official name (with the leagal form), type(s) and often GND ID, as in this example:

# Steel Brothers & Company {19}

CREATE
LAST|Lde|"Steel Brothers & Company"
LAST|Len|"Steel Brothers & Company"
LAST|Dde|"Unternehmen; Kolonialgesellschaft"
LAST|Den|"business; colonial society"
LAST|Ade|"W. Strang Steel & Co"
LAST|Aen|"W. Strang Steel & Co"
LAST|P4293|"co/068007"
LAST|P31|Q4830453|S248|Q36948990|S4293|"co/068007"|S1810|"Steel Brothers & Company, Ltd."|S813|+2021-08-10T00:00:00Z/11
LAST|P31|Q1700154|S248|Q36948990|S4293|"co/068007"|S1810|"Steel Brothers & Company, Ltd."|S813|+2021-08-10T00:00:00Z/11
LAST|P227|"2040532-7"|S248|Q36948990|S4293|"co/068007"|S1810|"Steel Brothers & Company, Ltd."|S813|+2021-08-10T00:00:00Z/11
LAST|P571|+1870-01-01T00:00:00Z/9|S248|Q36948990|S4293|"co/068007"|S1810|"Steel Brothers & Company, Ltd."|S813|+2021-08-10T00:00:00Z/11
LAST|P1448|de:"Steel Brothers & Company, Ltd."|S248|Q36948990|S4293|"co/068007"|S1810|"Steel Brothers & Company, Ltd."|S813|+2021-08-10T00:00:00Z/11


Since both lists followed exactly the same order (primarily by descending number of documents, to put the most relevant companies on top) and were updated every hour, the workflow was easy: step through the list, search for existing items and link them, add the remaining entries one by one or in batches to Wikidata via QS, and repeat until all entries are linked, and both lists are empty. In result, 3897 PM20 folders could be linked to existing items, while 5085 items were created from scratch. (Query code for the search list, the insert list, and the conversion to QS statements are available.)

Enriching the metadata

After the mapping process had been finished, we added missing metadata from PM20 to all linked Wikidata items. That included country and headquarter location (having Geonames identifiers in PM20 helped a lot), inception and dissolution dates, links to predecessor and parent companies, or links to persons in their role as founder or board members.

Table of organization properties sourced in PM20:

PID Property Pre-existing Items New Items Total
P452 industry 5509 7682 13191
P17 country 769 5105 5874
P31 instance of 424 5371 5795
P1448 official name 94 5073 5167
P159 headquarters location 722 3542 4264
P571 inception 371 3800 4171
P227 GND ID 816 1708 2524
P355 subsidiary 764 191 955
P749 parent organization 384 567 951
P576 dissolved, abolished or demolished date 204 673 877
P156 followed by 331 424 755
P155 follows 538 209 747
P3320 board member 460 35 495
P112 founded by 78 22 100
P5052 supervisory board member 63 20 83

Source

Classification by industry

The PM20 companies archive was organized by industries, in two different ways: firstly, a custom classification was used for all folders, derived from an ancient version of the "economic sectors" part of the STW Thesaurus for Economics. Secondly, parts of the folders were classified according to the European economic activities classification NACE Rev. 2.

Here, the approach was to map the custom classification to existing - and a few newly built - industry items in Wikidata (see mapping). This allowed to fill the "industry" property of all linked Wikidata items with values derived from PM20. Additionally, further matching industries were derived from the "NACE code" property in Wikidata. Interestingly, this combined approach extended the coverage of companies folders by NACE significantly - from 3,648 to 6,233.

Due to incompatibilities on the conceptual level, that could not be extended to all industries. To give an example: One of the most important industry sectors in Germany, "Metallinstustrie" (metal industry), cannot be represented by a NACE class: "C24 Manufacture of basic metals" is strictly separate from "C25 Manufacture of fabricated metal products", while further processing of metals, e.g. machinery and equipment, are assigned to still other classes.

Supplemented with "plain" Wikidata industries, it proved nevertheless possible to create a complete hierarchical list of companies with PM20 folders by NACE code, and in absence of a NACE code, by Wikidata industry label.

chart of PM20 industries

(Wikidata query result with mouse-over labels)

As a result of this data donation, the coverage of 20th century companies and organizations in Wikidata has improved considerably, both in width and depth. With the links to the digitized PM20 folders, about 1.2 million document pages has been made available from the according items for FAIR use in research, education and public information.

 

29. Januar 2021
von Joachim Neubert
Kommentare deaktiviert für Data donation to Wikidata, part 2: country/subject dossiers of the 20th Century Press Archives

Data donation to Wikidata, part 2: country/subject dossiers of the 20th Century Press Archives

The world's largest public newspaper clippings archive comprises lots of material of great interest particularly for authors and readers in the Wikiverse. ZBW has digitized the material from the first half of the last century, and has put all available metadata under a CC0 license. More so, we are donating that data to Wikidata, by adding or enhancing items and providing ways to access the dossiers (called "folders") and clippings easily from there.

Challenges of modelling a complex faceted classification in Wikidata

That had been done for the persons' archive in 2019 - see our prior blog post. For persons, we could just link from existing or a few newly created person items to the biographical folders of the archive. The countries/subjects archives provided a different challenge: The folders there were organized by countries (or continents, or cities in a few cases, or other geopolitical categories), and within the country, by an extended subject category system (available also as SKOS). To put it differently: Each folder was defined by a geo and a subject facet - a method widely used in general purpose press archives, because it allowed a comprehensible and, supported by a signature system, unambiguous sequential shelf order, indispensable for quick access to the printed material.

Folders specifically about one significant topic (like the Treaty of Sèvres) are rare in the press archives, whereas country/subject combinations are rare among Wikidata items - so direct linking between existing items and PM20 folders was hardly achievable. The folders in themselves had to be represented as Wikidata items, just like other sources used there. Here however we did not have works or scientific articles, but thematic mini-collections of press clippings, often not notable in themselves and normally without further formal bibliographic data. So a class of PM20 country/subject folder was created (as subclass of dossier, a collection of documents). Aiming at items for each folder - and having them linked via PM20 folder ID (P4293) to the actual press archive folders was yet only part of the solution.

In order to represent the faceted structure of the archive, we needed anchor points for both facets. That was easy for the geographical categories: the vast majority of them already existed as items in Wikidata, a few historical ones, such as Russian peripheral countries, had to be created. For the subject categories, the situation was much different. Categories such as The country and its people, politics and economy, general or Postal services, telegraphy and telephony were constructed as baskets for collecting articles on certain broader topics. They do not have an equivalent in Wikidata, which tries to describe real world entities or clear-cut concepts. We decided therefore to represent the categories of the subject category system with their own items of type PM20 subject category. Each of the about 1400 categories is connected to the upper one via a "part of" (P361) property, thus forming a five-level hierarchy.

More implementation subtleties

For both facets, according Wikidata properties where created as "PM20 geo code" (P8483) and "PM20 subject code" (P8484). As external identifiers, they link directly to lists of subjects (e.g., for Japan) or geographical entities (e.g., for The country ..., general). For all countries where the press archives material has been processed - this includes the tedious task of clarifying the intellectual property rights status of each article -, the Wikidata item for the country includes now a link to a list of all press archives dossiers about this country, covering the first half of the 20th century.

PM20 country categories

The folders represented in Wikidata (e.g., Japan : The country ..., general) use "facet of" (P1269) and "main subject" (P921) properties to connect to the items for the country and subject categories. Thus, not only each of the 9,200 accessible folders of the PM20 country/subject archive is accessible via Wikidata. Since the structural metadata of PM20 is available, too, it can be queried in its various dimensions - see for example the list of top level subject categories with the number of folders and documents, or a list of folders per country, ordered by signature (with subtleties covered by a "series ordial" (P1545) qualifier). The interactive map of subject folders as shown above is also created by a SPARQL query, and gives a first impression of the geographical areas covered in depth - or yet only sparsely - in the online archive.

Core areas: worldwide economy, worldwide colonialism

The online data reveals core areas of attention during 40 years of press clippings collection until 1949. Economy, of course, was in the focus of the former HWWA (Hamburg Archive for the International Economy), in Germany and namely Hamburg, as well as in every other country. More than half of all subject categories are part of the n Economy section of the category system and give in 4,500 folders very detailed access to the field. About 100,000 of the almost 270,000 online documents of the archive are part of this section, followed by history and general politics, foreign policy, and public finance, down to more peripheral topics like settling and migration, minorities, justice or literature. Originating in the history of the institution (which was founded as "Zentralstelle des Hamburgischen Kolonialinstituts", the central office of the Hamburg colonial institute) colonial efforts all over the world were monitored closely. We published with priority the material about the former German colonies, listed in the Archivführer Deutsche Kolonialgeschichte (Archive guide to the German Colonial Past, also interconnected to Wikidata). Originally collected to support the aggressive and inhuman policy of the German Empire, it is now available to serve as research material for critical analysis in the emerging field of colonial and postcolonial studies.

Enabling future community efforts

While all material about the German colonies (and some about the Italian ones) is online, and accessible now via Wikidata, this is not true for the former British/French/Dutch/Belgian colonies. While Japan or Argentina are accessible completely, China, India or the US are missing, as well as most of the European countries. And while 800+ folders about Hamburg cover it's contemporary history quite well, the vast majority of the material about Germany as a whole is only accessible "on premises" within ZBW's locations. It however is available as digital images, and can be accessed through finding aids (in German), which in the reading rooms directly link to a document viewer. The metadata for this material is now open data and can be changed and enhanced in Wikidata. A very selective example how that could work is a topic in German-Danish history - the 1920 Schleswig plebiscites. The PM20 folder about these events was not part of the published material, but got some interest with last year's centenary. The PM20 metadata on Wikidata made it possible to create an according folder completely in Wikidata, Nordslesvig : Historical events, with a (provisional) link to a stretch of images on a digitized film. While the checking and activation of these images for the public was a one-time effort in the context of an open science event, the creation of a new PM20 folder on Wikidata may demonstrate how open metadata can be used by a dedicated community of knowledge to enable access to not-yet-open knowledge. Current intellectual property law in the EU forbids open access to all digitized clippings from newspapers published in 1960 until 2031, and all where the death date of a named author is not known until after 2100. Of course, we hope for a change in that obstrusive legislation in a not-so-far future. We are confident that the metadata about the material, now in Wikidata, will help bridging the gap until it will finally be possible to use all digitized press archives contents as open scientific and educational resources, within and outside of the Wikimedia projects.

More information at WikiProject 20th Century Press Archives, which links also to the code for creating this data donation.

7. Dezember 2020
von Joachim Neubert
Kommentare deaktiviert für Building the SWIB20 participants map

Building the SWIB20 participants map

 SWIB20 participant map

Here we describe the process of building the interactive SWIB20 participants map, created by a query to Wikidata. The map was intended to support participants of SWIB20 to make contacts in the virtual conference space. However, in compliance with GDPR we want to avoid publishing personal details. So we choose to publish a map of institutions, to which the participants are affiliated. (Obvious downside: the 9 un-affiliated participants could not be represented on the map).

We suppose that the method can be applied to other conferences and other use cases - e.g., the downloaders of scientific software or the institutions subscribed to an academic journal. Therefore, we describe the process in some detail.

  1. We started with a list of institution names (with country code and city, but without person ids), extracted and transformed from our ConfTool registration system, saved it in CSV format. Country names were normalized, cities were not (and only used for context information).

  2. We created an OpenRefine project, and reconciled the institution name column with Wikidata items of type Q43229 (organization, and all its subtypes). We included the country column (-> P17, country) as relevant other detail, and let OpenRefine “Auto-match candidates with high confidence”. Of our original set of 335 country/institution entries, 193 were automaticaly matched via the Wikidata reconciliation service. At the end of the conference, 400 institutions were identified and put on the map (data set).

  3. We went through all un-matched entries and either
    a) selected one of the suggested items, or
    b) looked up and tweaked the name string in Wikidata, or in Google, until we found an according Wikipedia page, openend the linked Wikidata object from there, and inserted the QID in OpenRefine, or
    c) created a new Wikidata item (if the institution seemed notable), or
    d) attached “not yet determined” (Q59496158) where no Wikidata item (yet) exists, or
    e) attached “undefined value” (Q7883029) where no institution had been given

  4. The results were exported from OpenRefine into a .tsv file (settings)

  1. Again via a script, we loaded ConfTool participants data, built a lookup table from all available OpenRefine results (country/name string -> WD item QID), aggregated participant counts per QID, and loaded that data into a custom SPARQL endpoint, which is accessible from the Wikidata Query Service. As in step 1, for all (new) institution name strings, which were not yet mapped to Wikidata, a .csv file was produced. (An additional remark: If no approved custom SPARQL endpoint is available, it is feasible to generate a static query with all data in it’s “values” clause.)

    SWIB20 map data flow
  2. During the preparation of the conference, more and more participants registered, which required multiple loops: Use the csv file of step 5 and re-iterate, starting at step 2. (Since I found no straightforward way to update an existing OpenRefine project with extended data, I created a new project with new input and output files for every iteration.)

  3. Finally, to display the map we could run a federated query on WDQS. It fetches the institution items from the custom endpoint and enriches them from Wikidata with name, logo and image of the institution (if present), as well as with geographic coordinates, obtained directly or indirectly as follows:
    a) item has “coodinate location” (P625) itself, or
    b) item has “headquarters location” item with coordinates (P159/P625), or
    c) item has “located in administrative entity” item with coordinates (P131/P625), or
    c) item has “country” item (P17/P625)
    Applying this method, only one institution item could not be located on the map.

SWIB20 participant map - detail

Data improvements

The way to improve the map was to improve the data about the items in Wikidata - which also helps all future Wikidata users.

New items

For a few institutions, new items were created:

For another 14 institutions, mostly private companies, no items were created due to notability concerns. Everything else already had an item in Wikidata!

Improvement of existing items

In order to improve the display on the map, we enhanced selected items in Wikidata in various ways:

  • Add English label
  • Add type (instance of)
  • Add headquarter location
  • Add image and/or logo

And we hope, that participants of the conference also took the opportunity to make their institution “look better”, by adding for example an image of it to the Wikidata knowledge base.

Putting Wikidata into use for a completely custom purpose thus created incentives for improving “the sum of all human knowledge” step by tiny step.

 

 

 

24. Oktober 2019
von Joachim Neubert
Kommentare deaktiviert für 20th Century Press Archives: Data donation to Wikidata

20th Century Press Archives: Data donation to Wikidata

ZBW is donating a large open dataset from the 20th Century Press Archives to Wikidata, in order to make it better accessible to various scientific disciplines such as contemporary, economic and business history, media and information science, to journa...

24. Oktober 2019
von Joachim Neubert
Kommentare deaktiviert für 20th Century Press Archives: Data donation to Wikidata

20th Century Press Archives: Data donation to Wikidata

ZBW is donating a large open dataset from the 20th Century Press Archives to Wikidata, in order to make it better accessible to various scientific disciplines such as contemporary, economic and business history, media and information science, to journalists, teachers, students, and the general public.

The 20th Century Press Archives (PM20) is a large public newspaper clippings archive, extracted from more than 1500 different sources published in Germany and all over the world, covering roughly a full century (1908-2005). The clippings are organized in thematic folders about persons, companies and institutions, general subjects, and wares. During a project originally funded by the German Research Foundation (DFG), the material up to 1960 has been digitized. 25,000 folders with more than two million pages up to 1949 are freely accessible online.  The fine-grained thematic access and the public nature of the archives makes it to our best knowledge unique across the world (more information on Wikipedia) and an essential research data fund for some of the disciplines mentioned above.

The data donation does not only mean that ZBW has assigned a CC0 license to all PM20 metadata, which makes it compatible with Wikidata. (Due to intellectual property rights, only the metadata can be licensed by ZBW - all legal rights on the press articles themselves remain with their original creators.) The donation also includes investing a substantial amount of working time (during, as planned, two years) devoted to the integration of this data into Wikidata. Here we want to share our experiences regarding the integration of the persons archive metadata.

Folders from the person archive, 2015 (Credit: Max-Michael Wannags)

Folders from the persons archive, in 2015 (Credit: Max-Michael Wannags)

Linking our folders to Wikidata

The essential bit for linking the digitized folders was in place before the project even started: an external identifier property (PM20 folder ID, P4293), proposed by an administrator of the German Wikipedia in order to link to PM20 person and company folders. We participated in the property proposal discussion and made sure that the links did not have to reference our legacy Coldfusion application. Instead, we created a "partial redirect" on the purl.org service (maintained formerly by OCLC, now by the Internet Archive) for persistent URLs which may redirect to another application on another server in future. Secondly, the identifier and URL format was extended to include subject and ware folders, which are defined by a combination of two keys, one for the country and another for the topic. The format of the links in Wikidata is controlled by a regular expression, which covers all four archives mentioned above. That works pretty well -  very few format errors occurred so far -, and it relieved us from creating four different archive-specific properties.

Shortly after the property creation, Magnus Manske, the author of the original Mediawiki software and lots of related tools, scraped our web site and created a Mix-n-Match catalog from it. During the following two years, more than 60 Wikidata users contributed to matching Wikidata items for humans to PM20 folder IDs.

For a start, deriving links from GND

Many of the PM20 person and company folders were already identified by an identifier from the German Integrated Authority File (GND). So, our first step was creating PM20 links for all Wikidata items which had matching GND IDs. For all these items and folders, disambiguation had already taken place, and we could safely add all these links automatically.

Infrastructure: PM20 endpoint, federated queries and QuickStatements

To make this work, we relied heavily on Linked Data technologies. A PM20 SPARQL endpoint had already been set up for our contribution to Coding da Vinci (a "Kultur-Hackathon" in Germany). Almost all automated changes to Wikidata we made are based on federated queries on our own endpoint, reaching out to the Wikidata endpoint, or vice versa, from Wikidata to PM20. In the latter case, the external endpoint has to be registered at Wikidata. Wikidata maintains a help page for this type of queries.

For our purposes, federated queries allow extracting current data from both endpoints. In the case of the above-mentioned missing_pm20_id_via_gnd.rq query, this way we can skip all items, where a link to PM20 already exists.

Within the query itself, we create a statement string which we can feed into the QuickStatements tool. That includes, for every single statement, a reference to PM20 with link to the actual folder, so that the provenance of these statements is always clear and traceable. Via script, a statement file is extracted and saved with a timestamp. Data imports via QuickStatements are executed in batch mode, and an activity log keeps track of all data imports and other activities related to PM20.

Creating missing items

After the matching of about 93 % of the person folders which include free documents in Mix-n-Match, and some efforts to discover more pre-existing Wikidata items, we decided to create the 346 missing person items, again via QuickStatements input. We used the description field in Wikidata by importing the content of the free-text "occupation" field in PM20 for better disambiguation of the newly created items. (Here a rather minimal example of such an item created from PM20 metadata.) Thus, all PM20 person folders which have digitized content were linked to Wikidata in June 2019.

Supplementing Wikidata with PM20 metadata

A second part of the integration of PM20 metadata into Wikidata was the import of missing property values to the according items. This comprised simple facts like "date of birth/death", occupations such as "economist", "business economist", "social scientist", "earth scientist", which we could derive from the "field of activity" in PM20, up to relations between existing items, e.g. a family member to the according family, or a board member to the according company. A few other source properties have been postponed, because alternative solutions exist, and the best one may depend on the intended use in future applications. The steps of this enrichment process and links to the code used - including the automatic generation of references - are online, too.

Addition to Wikidata item "Friedrich Krupp AG" (Q679201) from PM20 metadata

Complex statement added to Wikidata item for Friedrich Krupp AG

Again, we used federated queries. Often the target of a Wikidata property is an item in itself. Sometimes, we could directly get this via the target item's PM20 folder ID (families, companies); sometimes we had to create lookup tables. For the latter, we used "values" clauses in the query (in case of "occupation"), or (in case of "country of citizenship"), we have to match countries from our internal classification in advance - a process for which we use OpenRefine. Other than PM20 folder IDs, which we avoided adding when folders do not contain digitized content, we added the metadata to all items which were linked to PM20, and intend to repeat this process periodically when more items (e.g., companies) are identified by PM20 folder IDs. In some housekeeping activity, we also add periodically the numbers of documents (online and total) and the exact folder names as qualifiers to newly emerging PM20 links in items.

Results of the data donation so far

With all 5266 persons folder with digitized documents linked to Wikidata, the data donation of the person folders metadata is completed. Besides the folder links, which have already heavily been used to create links in Wikipedia articles, we have got

- more than 6000 statements which are sourced in PM20 (from "date of birth" to the track gauge of a Brazilian railway line)

- more than 1000 items, for which PM20 ID is the only external identifier

The data donation will be presented on the WikidataCon in Berlin (24.-26.10.2019) as a "birthday present" on the occasion Wikidata's seventh birthday. ZBW will further keep the digital content available, amended with a static landing page for every folder, which also will serve as source link for the metadata we have integrated into Wikidata. But in future, Wikidata will be the primary access path to our data, providing further metadata in multiple languages and links to a plethora of other external sources. And the best is, different from our current application, everybody will be able to enhance this open data through the interactive tools and data interfaces provided by Wikidata.

Participate in WikiProject 20th Century Press Archives

For the topics, wares and companies archives, there is still a long way to go. The best structure for representing these archives and their folders - often defined by the combination of a country within a geographical hierarchy with a subject heading in a deeply nested topic classification -, has to be figured out. Existing items have to be matched, and lots of other work is to be done. Therefore, we have created the WikiProject 20th Century Press Archives in Wikidata to keep track of discussions and decisions, and to create a focal point for participation. Everybody on Wikidata is invited to participate - or just kibitz. It could be challenging particularly for information scientists, and people interested in historic systems for the organization of knowledge about the whole world, to take part in the mapping of one of these systems to the emerging Wikidata knowledge graph.

 

23. Oktober 2018
von Joachim Neubert
Kommentare deaktiviert für ZBW’s contribution to „Coding da Vinci“: Dossiers about persons and companies from 20th Century Press Archives

ZBW’s contribution to „Coding da Vinci“: Dossiers about persons and companies from 20th Century Press Archives

At 27th and 28th of October, the Kick-off for the "Kultur-Hackathon" Coding da Vinci is held in Mainz, Germany, organized this time by GLAM institutions from the Rhein-Main area: "For five weeks, devoted fans of culture and hacking alike will prototype...

30. November 2017
von Joachim Neubert
Kommentare deaktiviert für Wikidata as authority linking hub: Connecting RePEc and GND researcher identifiers

Wikidata as authority linking hub: Connecting RePEc and GND researcher identifiers

In the EconBiz portal for publications in economics, we have data from different sources. In some of these sources, most notably ZBW's "ECONIS" bibliographical database, authors are disambiguated by identifiers of the Integrated Authority File (GND) - ...

2. März 2017
von Joachim Neubert
Kommentare deaktiviert für New version of multi-lingual JEL classification published in LOD

New version of multi-lingual JEL classification published in LOD

The Journal of Economic Literature Classification Scheme (JEL) was created and is maintained by the American Economic Association. The AEA provides this widely used resource freely for scholarly purposes. Thanks to André Davids (KU Leuven), who has translated the originally English-only labels of the classification to French, Spanish and German, we provide a multi-lingual version of JEL. It's lastest version (as of 2017-01) is published in the formats RDFa and RDF download files. These formats and translations are provided "as is" and are not authorized by AEA. In order to make changes in JEL tracable more easily, we have created lists of inserted and removed JEL classes in the context of the skos-history project.