• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

pulibrary / pdc_discovery / 5af13607-7741-41ac-b110-c94c8a3ee0f0

pending completion
5af13607-7741-41ac-b110-c94c8a3ee0f0

Pull #441

circleci

hectorcorrea
Better error handling
Pull Request #441: Indexing to a new collection

99 of 99 new or added lines in 3 files covered. (100.0%)

1816 of 2220 relevant lines covered (81.8%)

95.45 hits per line

Source File
Press 'n' to go to next uncovered line, 'b' for previous

97.44
/app/lib/describe_indexer.rb
1
# frozen_string_literal: true
2

3
require 'faraday_middleware'
1✔
4
require 'traject'
1✔
5
require 'open-uri'
1✔
6

7
##
8
# Fetch an RSS feed of approved works from PDC Describe. For each work, index a PDC Describe JSON resource to solr.
9
class DescribeIndexer
1✔
10
  ##
11
  # See config/pdc_discovery.yml for configuration of the RSS feed that
12
  # this indexer uses to harvest data from PDC Describe.
13
  # @param [String] rss_url
14
  def initialize(rss_url: Rails.configuration.pdc_discovery.pdc_describe_rss)
1✔
15
    @rss_url = rss_url
23✔
16
  end
17

18
  ##
19
  # Load the traject indexing config for PDC Describe JSON resources
20
  def traject_indexer
1✔
21
    Traject::Indexer::NokogiriIndexer.new.tap do |i|
27✔
22
      i.load_config_file(datacite_indexing_config_path)
27✔
23
    end
24
  end
25

26
  def datacite_indexing_config_path
1✔
27
    pathname = ::Rails.root.join('lib', 'traject', "pdc_describe_indexing_config.rb")
27✔
28
    pathname.to_s
27✔
29
  end
30

31
  ##
32
  # Only index if Rails.configuration.pdc_discovery.index_pdc_describe == true
33
  # See config/pdc_discovery.yml to change this setting for a given environment.
34
  def index
1✔
35
    if Rails.configuration.pdc_discovery.index_pdc_describe == true
5✔
36
      perform_indexing
4✔
37
    else
38
      Rails.logger.warn "PDC Describe indexing is not turned on for this environment. See config/pdc_discovery.yml"
1✔
39
    end
40
  end
41

42
  # Given a json document, return an XML string that contains
43
  # the JSON blob as a CDATA element
44
  # @param [String] json
45
  # @return [String]
46
  def prep_for_indexing(json)
1✔
47
    xml = JSON.parse(json).to_xml
26✔
48
    doc = Nokogiri::XML(xml)
26✔
49
    collection_node = doc.at('group')
26✔
50
    cdata = Nokogiri::XML::CDATA.new(doc, json)
26✔
51
    collection_node.add_next_sibling("<pdc_describe_json></pdc_describe_json>")
26✔
52
    pdc_describe_json_node = doc.at('pdc_describe_json')
26✔
53
    pdc_describe_json_node.add_child(cdata)
26✔
54
    doc.to_s
26✔
55
  end
56

57
  def index_one(json)
1✔
58
    resource_xml = prep_for_indexing(json)
18✔
59
    traject_indexer.process(resource_xml)
18✔
60
    traject_indexer.complete
×
61
  end
62

63
private
1✔
64

65
  ##
66
  # Parse the rss_url, get a JSON resource url for each item, convert it to XML, and pass it to traject
67
  def perform_indexing
1✔
68
    doc = Nokogiri::XML(URI.open(@rss_url))
4✔
69
    url_list = doc.xpath("//item/url/text()").map(&:to_s)
4✔
70
    url_list.each do |url|
4✔
71
      resource_json = URI.open(url).read
8✔
72
      resource_xml = prep_for_indexing(resource_json)
8✔
73
      traject_indexer.process(resource_xml)
8✔
74
    rescue => ex
75
      Rails.logger.warn "Error importing record from #{url}. Exception: #{ex.message}"
8✔
76
      Honeybadger.notify "Error importing record from #{url}. Exception: #{ex.message}"
8✔
77
    end
78
  end
79
end
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2025 Coveralls, Inc