• Home
  • Features
  • Pricing
  • Docs
  • Announcements
  • Sign In

pulibrary / pymarc_dedupe
100%

Build:
DEFAULT BRANCH: main
Repo Added 21 Dec 2024 08:02PM UTC
Files 20
Badge
Embed ▾
README BADGES
x

If you need to use a raster PNG badge, change the '.svg' to '.png' in the link

Markdown

Textile

RDoc

HTML

Rst

LAST BUILD ON BRANCH main
branch: main
CHANGE BRANCH
x
Reset
  • main
  • actually_run
  • add_ci
  • add_postgres
  • clean_up_example_files
  • create_confusion_matrix
  • create_csv_from_dictionary
  • find_dups_single_file
  • get_coverage_back_to_100
  • give_both_input_and_output_paths_for_csv
  • goldrush_diacritics
  • i1_move_normalization
  • include_goldrush_in_export
  • increase_code_coverage
  • linting
  • pass_json_marc_records
  • record_linkage
  • record_linkage_tested
  • refactor_as_goldrush
  • remainder_of_gold_rush
  • remove_attribute_error_exceptions
  • remove_spaces_from_title
  • test_on_scsb_data

27 May 2025 10:43PM UTC coverage: 100.0%. Remained the same
447d729c-fa48-403a-a433-ffb3be0b5c85

push

circleci

web-flow
Use postgres database version for very large data sets (#24)

* Green locally - connect to Postgres DB

- Still need to connect to DB in other environments, including CI
- Need to clear DB between tests
- Need to make sure not to make tons of duplicate records (try to find before creating?)

* Set up DB connection for CircleCI

* Linting fixes

* Ensure that the same records are not re-created

* Try setting up environment using dynaconf

* Fix test DB connection

* Try to fix connection to DB in CI

* Remove pyproject.toml for now

* Checkpoint - green, reads and writes to/from database

* Put blocking in marc_to_db.py

* Try to increase test coverage

* Linting fixes

* Try to increase test coverage

* Test for empty input directory

* Make table creation a class method, work on mapping records to DB

* Use streaming JSON for memory performance

* Linting fixes

* Finish writing to CSV

* Linting, change to cluster_id

* More consistent naming

* Small fixes

* Linting fixes

* Increase test coverage for scoring

* rescue if there is no "a" field in author

* Use threading, override xml_reader for error handling

* Add output of comparison experiment - uses data set from Mark Z

* Setup for db comparison

* Add print statement

* Lint

* Increase coverage, lint

* Increase coverage

* Increase test coverage again

* Try using python orb for easier CI caching

* Fix edition mapping, test coverage

* Add pyproject.toml, ignore cache for ruff

* Formatting

* Re-organize into folders

* Add Goldrush to db & report

* Update data for comparison

* Ensure everything is linted, start adding module comments

* Remove unneeded comments in circleci config

291 of 291 new or added lines in 13 files covered. (100.0%)

842 of 842 relevant lines covered (100.0%)

1.0 hits per line

Relevant lines Covered
Build:
Build:
842 RELEVANT LINES 842 COVERED LINES
1.0 HITS PER LINE
Source Files on main
  • Tree
  • List 20
  • Changed 0
  • Source Changed 0
  • Coverage Changed 0
Coverage ∆ File Lines Relevant Covered Missed Hits/Line

Recent builds

Builds Branch Commit Type Ran Committer Via Coverage
447d729c... main Use postgres database version for very large data sets (#24) * Green locally - connect to Postgres DB - Still need to connect to DB in other environments, including CI - Need to clear DB between tests - Need to make sure not to make tons of dupl... push 27 May 2025 10:45PM UTC web-flow circleci
100.0
a061f58b... main Create confusion matrix (#22) * Create confusion matrix & scoring to compare with prior art * Avoid shadowed variables * Increase code coverage push 05 May 2025 07:21PM UTC web-flow circleci
100.0
01187b8f... main Add a second ML model for finding duplicates in a single file, plus abstract class (#21) * Factor out an abstract MachineLearningModel class push 20 Mar 2025 03:42PM UTC web-flow circleci
100.0
8af7c9eb... main Parse Marc records in either JSON or XML formats (#20) - Move test fixtures into their own directory push 14 Mar 2025 07:39PM UTC web-flow circleci
100.0
21d442bd... main Add a script and class to check whether a MarcXML file has duplicates in another MarcXML file (#19) push 21 Feb 2025 08:49PM UTC web-flow circleci
100.0
be5e98f8... main Use unidecode to normalize diacritics in strings (#17) push 20 Jan 2025 03:52PM UTC web-flow circleci
100.0
9f4cb6cb... main Test description without subfield a (#16) - This is not valid marc, but came up in testing against SCSB data push 15 Jan 2025 08:53PM UTC web-flow circleci
100.0
f3238ce5... main Remove AttributeError exceptions (#15) - Since MarcRecord now returns an empty string instead of None for unset values, we no longer need to rescue from AttributeErrors in the GoldRush class push 15 Jan 2025 06:48PM UTC web-flow circleci
99.39
9113637c... main Name CSV output file based on input xml file name (#14) push 15 Jan 2025 06:44PM UTC web-flow circleci
98.8
6581b2fd... main Test against SCSB file, updates to make it not error (#13) - Some of the errors were because of metadata issues, but we still don't want to raise an error for them push 15 Jan 2025 06:05PM UTC web-flow circleci
98.78
See All Builds (113)
  • Repo on GitHub
STATUS · Troubleshooting · Open an Issue · Sales · Support · CAREERS · ENTERPRISE · START FREE · SCHEDULE DEMO
ANNOUNCEMENTS · TWITTER · TOS & SLA · Supported CI Services · What's a CI service? · Automated Testing

© 2025 Coveralls, Inc