Skip to content

Class: DeduplicationProcess

Process for identifying and removing duplicate records

URI: revaise:DeduplicationProcess

```mermaid classDiagram class DeduplicationProcess click DeduplicationProcess href "../DeduplicationProcess/" DeduplicationProcess : dedup_completed_at

  DeduplicationProcess : dedup_criteria

  DeduplicationProcess : dedup_id

  DeduplicationProcess : dedup_method





    DeduplicationProcess --> "1" DeduplicationMethod : dedup_method
    click DeduplicationMethod href "../DeduplicationMethod/"



  DeduplicationProcess : dedup_notes

  DeduplicationProcess : dedup_performed_by





    DeduplicationProcess --> "*" Author : dedup_performed_by
    click Author href "../Author/"



  DeduplicationProcess : dedup_started_at

  DeduplicationProcess : dedup_tools





    DeduplicationProcess --> "*" ExternalTool : dedup_tools
    click ExternalTool href "../ExternalTool/"



  DeduplicationProcess : duplicate_count

  DeduplicationProcess : duplicate_groups





    DeduplicationProcess --> "*" DuplicateGroup : duplicate_groups
    click DuplicateGroup href "../DuplicateGroup/"



  DeduplicationProcess : input_record_count

  DeduplicationProcess : unique_record_count

```

Slots

Name Cardinality and Range Description Inheritance
dedup_id 1
String
Unique identifier for the deduplication process direct
input_record_count 1
Integer
Total number of records before deduplication direct
unique_record_count 1
Integer
Number of unique records after deduplication direct
duplicate_count 1
Integer
Number of duplicate records identified direct
dedup_method 1
DeduplicationMethod
Method used for deduplication direct
dedup_tools *
ExternalTool
External tools used for deduplication direct
dedup_criteria *
String
Criteria used to identify duplicates (e direct
duplicate_groups *
DuplicateGroup
Groups of identified duplicate records direct
dedup_started_at 0..1
Datetime
When deduplication started direct
dedup_completed_at 0..1
Datetime
When deduplication completed direct
dedup_performed_by *
Author
Who performed the deduplication direct
dedup_notes 0..1
String
Additional notes about the deduplication process direct

Usages

used by used in type used
ScreeningStage deduplication_process range DeduplicationProcess

Identifier and Mapping Information

Schema Source

  • from schema: https://open-and-sustainable.github.io/revaise-model/schema/stages/screening

Mappings

Mapping Type Mapped Value
self revaise:DeduplicationProcess
native revaise:DeduplicationProcess

LinkML Source

Direct

name: DeduplicationProcess
description: Process for identifying and removing duplicate records
from_schema: https://open-and-sustainable.github.io/revaise-model/schema/stages/screening
slots:
- dedup_id
- input_record_count
- unique_record_count
- duplicate_count
- dedup_method
- dedup_tools
- dedup_criteria
- duplicate_groups
- dedup_started_at
- dedup_completed_at
- dedup_performed_by
- dedup_notes
slot_usage:
  dedup_id:
    name: dedup_id
    description: Unique identifier for the deduplication process
    identifier: true
    range: string
  input_record_count:
    name: input_record_count
    description: Total number of records before deduplication
    range: integer
    required: true
  unique_record_count:
    name: unique_record_count
    description: Number of unique records after deduplication
    range: integer
    required: true
  duplicate_count:
    name: duplicate_count
    description: Number of duplicate records identified
    range: integer
    required: true
  dedup_method:
    name: dedup_method
    description: Method used for deduplication
    range: DeduplicationMethod
    required: true
  dedup_tools:
    name: dedup_tools
    description: External tools used for deduplication
    range: ExternalTool
    multivalued: true
  dedup_criteria:
    name: dedup_criteria
    description: Criteria used to identify duplicates (e.g., DOI, title similarity)
    range: string
    multivalued: true
  duplicate_groups:
    name: duplicate_groups
    description: Groups of identified duplicate records
    range: DuplicateGroup
    multivalued: true
  dedup_started_at:
    name: dedup_started_at
    description: When deduplication started
    range: datetime
  dedup_completed_at:
    name: dedup_completed_at
    description: When deduplication completed
    range: datetime
  dedup_performed_by:
    name: dedup_performed_by
    description: Who performed the deduplication
    range: Author
    multivalued: true
  dedup_notes:
    name: dedup_notes
    description: Additional notes about the deduplication process
    range: string

Induced

name: DeduplicationProcess
description: Process for identifying and removing duplicate records
from_schema: https://open-and-sustainable.github.io/revaise-model/schema/stages/screening
slot_usage:
  dedup_id:
    name: dedup_id
    description: Unique identifier for the deduplication process
    identifier: true
    range: string
  input_record_count:
    name: input_record_count
    description: Total number of records before deduplication
    range: integer
    required: true
  unique_record_count:
    name: unique_record_count
    description: Number of unique records after deduplication
    range: integer
    required: true
  duplicate_count:
    name: duplicate_count
    description: Number of duplicate records identified
    range: integer
    required: true
  dedup_method:
    name: dedup_method
    description: Method used for deduplication
    range: DeduplicationMethod
    required: true
  dedup_tools:
    name: dedup_tools
    description: External tools used for deduplication
    range: ExternalTool
    multivalued: true
  dedup_criteria:
    name: dedup_criteria
    description: Criteria used to identify duplicates (e.g., DOI, title similarity)
    range: string
    multivalued: true
  duplicate_groups:
    name: duplicate_groups
    description: Groups of identified duplicate records
    range: DuplicateGroup
    multivalued: true
  dedup_started_at:
    name: dedup_started_at
    description: When deduplication started
    range: datetime
  dedup_completed_at:
    name: dedup_completed_at
    description: When deduplication completed
    range: datetime
  dedup_performed_by:
    name: dedup_performed_by
    description: Who performed the deduplication
    range: Author
    multivalued: true
  dedup_notes:
    name: dedup_notes
    description: Additional notes about the deduplication process
    range: string
attributes:
  dedup_id:
    name: dedup_id
    description: Unique identifier for the deduplication process
    from_schema: https://open-and-sustainable.github.io/revaise-model/schema/stages/screening
    rank: 1000
    identifier: true
    alias: dedup_id
    owner: DeduplicationProcess
    domain_of:
    - DeduplicationProcess
    range: string
  input_record_count:
    name: input_record_count
    description: Total number of records before deduplication
    from_schema: https://open-and-sustainable.github.io/revaise-model/schema/stages/screening
    rank: 1000
    alias: input_record_count
    owner: DeduplicationProcess
    domain_of:
    - DeduplicationProcess
    range: integer
    required: true
  unique_record_count:
    name: unique_record_count
    description: Number of unique records after deduplication
    from_schema: https://open-and-sustainable.github.io/revaise-model/schema/stages/screening
    rank: 1000
    alias: unique_record_count
    owner: DeduplicationProcess
    domain_of:
    - DeduplicationProcess
    range: integer
    required: true
  duplicate_count:
    name: duplicate_count
    description: Number of duplicate records identified
    from_schema: https://open-and-sustainable.github.io/revaise-model/schema/stages/screening
    rank: 1000
    alias: duplicate_count
    owner: DeduplicationProcess
    domain_of:
    - DeduplicationProcess
    range: integer
    required: true
  dedup_method:
    name: dedup_method
    description: Method used for deduplication
    from_schema: https://open-and-sustainable.github.io/revaise-model/schema/stages/screening
    rank: 1000
    alias: dedup_method
    owner: DeduplicationProcess
    domain_of:
    - DeduplicationProcess
    range: DeduplicationMethod
    required: true
  dedup_tools:
    name: dedup_tools
    description: External tools used for deduplication
    from_schema: https://open-and-sustainable.github.io/revaise-model/schema/stages/screening
    rank: 1000
    alias: dedup_tools
    owner: DeduplicationProcess
    domain_of:
    - DeduplicationProcess
    range: ExternalTool
    multivalued: true
  dedup_criteria:
    name: dedup_criteria
    description: Criteria used to identify duplicates (e.g., DOI, title similarity)
    from_schema: https://open-and-sustainable.github.io/revaise-model/schema/stages/screening
    rank: 1000
    alias: dedup_criteria
    owner: DeduplicationProcess
    domain_of:
    - DeduplicationProcess
    range: string
    multivalued: true
  duplicate_groups:
    name: duplicate_groups
    description: Groups of identified duplicate records
    from_schema: https://open-and-sustainable.github.io/revaise-model/schema/stages/screening
    rank: 1000
    alias: duplicate_groups
    owner: DeduplicationProcess
    domain_of:
    - DeduplicationProcess
    range: DuplicateGroup
    multivalued: true
  dedup_started_at:
    name: dedup_started_at
    description: When deduplication started
    from_schema: https://open-and-sustainable.github.io/revaise-model/schema/stages/screening
    rank: 1000
    alias: dedup_started_at
    owner: DeduplicationProcess
    domain_of:
    - DeduplicationProcess
    range: datetime
  dedup_completed_at:
    name: dedup_completed_at
    description: When deduplication completed
    from_schema: https://open-and-sustainable.github.io/revaise-model/schema/stages/screening
    rank: 1000
    alias: dedup_completed_at
    owner: DeduplicationProcess
    domain_of:
    - DeduplicationProcess
    range: datetime
  dedup_performed_by:
    name: dedup_performed_by
    description: Who performed the deduplication
    from_schema: https://open-and-sustainable.github.io/revaise-model/schema/stages/screening
    rank: 1000
    alias: dedup_performed_by
    owner: DeduplicationProcess
    domain_of:
    - DeduplicationProcess
    range: Author
    multivalued: true
  dedup_notes:
    name: dedup_notes
    description: Additional notes about the deduplication process
    from_schema: https://open-and-sustainable.github.io/revaise-model/schema/stages/screening
    rank: 1000
    alias: dedup_notes
    owner: DeduplicationProcess
    domain_of:
    - DeduplicationProcess
    range: string