Class: DeduplicationProcess¶
Process for identifying and removing duplicate records
URI: revaise:DeduplicationProcess
```mermaid classDiagram class DeduplicationProcess click DeduplicationProcess href "../DeduplicationProcess/" DeduplicationProcess : dedup_completed_at
DeduplicationProcess : dedup_criteria
DeduplicationProcess : dedup_id
DeduplicationProcess : dedup_method
DeduplicationProcess --> "1" DeduplicationMethod : dedup_method
click DeduplicationMethod href "../DeduplicationMethod/"
DeduplicationProcess : dedup_notes
DeduplicationProcess : dedup_performed_by
DeduplicationProcess --> "*" Author : dedup_performed_by
click Author href "../Author/"
DeduplicationProcess : dedup_started_at
DeduplicationProcess : dedup_tools
DeduplicationProcess --> "*" ExternalTool : dedup_tools
click ExternalTool href "../ExternalTool/"
DeduplicationProcess : duplicate_count
DeduplicationProcess : duplicate_groups
DeduplicationProcess --> "*" DuplicateGroup : duplicate_groups
click DuplicateGroup href "../DuplicateGroup/"
DeduplicationProcess : input_record_count
DeduplicationProcess : unique_record_count
```
Slots¶
| Name | Cardinality and Range | Description | Inheritance |
|---|---|---|---|
| dedup_id | 1 String |
Unique identifier for the deduplication process | direct |
| input_record_count | 1 Integer |
Total number of records before deduplication | direct |
| unique_record_count | 1 Integer |
Number of unique records after deduplication | direct |
| duplicate_count | 1 Integer |
Number of duplicate records identified | direct |
| dedup_method | 1 DeduplicationMethod |
Method used for deduplication | direct |
| dedup_tools | * ExternalTool |
External tools used for deduplication | direct |
| dedup_criteria | * String |
Criteria used to identify duplicates (e | direct |
| duplicate_groups | * DuplicateGroup |
Groups of identified duplicate records | direct |
| dedup_started_at | 0..1 Datetime |
When deduplication started | direct |
| dedup_completed_at | 0..1 Datetime |
When deduplication completed | direct |
| dedup_performed_by | * Author |
Who performed the deduplication | direct |
| dedup_notes | 0..1 String |
Additional notes about the deduplication process | direct |
Usages¶
| used by | used in | type | used |
|---|---|---|---|
| ScreeningStage | deduplication_process | range | DeduplicationProcess |
Identifier and Mapping Information¶
Schema Source¶
- from schema: https://open-and-sustainable.github.io/revaise-model/schema
Mappings¶
| Mapping Type | Mapped Value |
|---|---|
| self | revaise:DeduplicationProcess |
| native | revaise:DeduplicationProcess |
LinkML Source¶
Direct¶
name: DeduplicationProcess
description: Process for identifying and removing duplicate records
from_schema: https://open-and-sustainable.github.io/revaise-model/schema
slots:
- dedup_id
- input_record_count
- unique_record_count
- duplicate_count
- dedup_method
- dedup_tools
- dedup_criteria
- duplicate_groups
- dedup_started_at
- dedup_completed_at
- dedup_performed_by
- dedup_notes
slot_usage:
dedup_id:
name: dedup_id
description: Unique identifier for the deduplication process
identifier: true
range: string
input_record_count:
name: input_record_count
description: Total number of records before deduplication
range: integer
required: true
unique_record_count:
name: unique_record_count
description: Number of unique records after deduplication
range: integer
required: true
duplicate_count:
name: duplicate_count
description: Number of duplicate records identified
range: integer
required: true
dedup_method:
name: dedup_method
description: Method used for deduplication
range: DeduplicationMethod
required: true
dedup_tools:
name: dedup_tools
description: External tools used for deduplication
range: ExternalTool
multivalued: true
dedup_criteria:
name: dedup_criteria
description: Criteria used to identify duplicates (e.g., DOI, title similarity)
range: string
multivalued: true
duplicate_groups:
name: duplicate_groups
description: Groups of identified duplicate records
range: DuplicateGroup
multivalued: true
dedup_started_at:
name: dedup_started_at
description: When deduplication started
range: datetime
dedup_completed_at:
name: dedup_completed_at
description: When deduplication completed
range: datetime
dedup_performed_by:
name: dedup_performed_by
description: Who performed the deduplication
range: Author
multivalued: true
dedup_notes:
name: dedup_notes
description: Additional notes about the deduplication process
range: string
Induced¶
name: DeduplicationProcess
description: Process for identifying and removing duplicate records
from_schema: https://open-and-sustainable.github.io/revaise-model/schema
slot_usage:
dedup_id:
name: dedup_id
description: Unique identifier for the deduplication process
identifier: true
range: string
input_record_count:
name: input_record_count
description: Total number of records before deduplication
range: integer
required: true
unique_record_count:
name: unique_record_count
description: Number of unique records after deduplication
range: integer
required: true
duplicate_count:
name: duplicate_count
description: Number of duplicate records identified
range: integer
required: true
dedup_method:
name: dedup_method
description: Method used for deduplication
range: DeduplicationMethod
required: true
dedup_tools:
name: dedup_tools
description: External tools used for deduplication
range: ExternalTool
multivalued: true
dedup_criteria:
name: dedup_criteria
description: Criteria used to identify duplicates (e.g., DOI, title similarity)
range: string
multivalued: true
duplicate_groups:
name: duplicate_groups
description: Groups of identified duplicate records
range: DuplicateGroup
multivalued: true
dedup_started_at:
name: dedup_started_at
description: When deduplication started
range: datetime
dedup_completed_at:
name: dedup_completed_at
description: When deduplication completed
range: datetime
dedup_performed_by:
name: dedup_performed_by
description: Who performed the deduplication
range: Author
multivalued: true
dedup_notes:
name: dedup_notes
description: Additional notes about the deduplication process
range: string
attributes:
dedup_id:
name: dedup_id
description: Unique identifier for the deduplication process
from_schema: https://open-and-sustainable.github.io/revaise-model/schema
rank: 1000
identifier: true
owner: DeduplicationProcess
domain_of:
- DeduplicationProcess
range: string
required: true
input_record_count:
name: input_record_count
description: Total number of records before deduplication
from_schema: https://open-and-sustainable.github.io/revaise-model/schema
rank: 1000
owner: DeduplicationProcess
domain_of:
- DeduplicationProcess
range: integer
required: true
unique_record_count:
name: unique_record_count
description: Number of unique records after deduplication
from_schema: https://open-and-sustainable.github.io/revaise-model/schema
rank: 1000
owner: DeduplicationProcess
domain_of:
- DeduplicationProcess
range: integer
required: true
duplicate_count:
name: duplicate_count
description: Number of duplicate records identified
from_schema: https://open-and-sustainable.github.io/revaise-model/schema
rank: 1000
owner: DeduplicationProcess
domain_of:
- DeduplicationProcess
range: integer
required: true
dedup_method:
name: dedup_method
description: Method used for deduplication
from_schema: https://open-and-sustainable.github.io/revaise-model/schema
rank: 1000
owner: DeduplicationProcess
domain_of:
- DeduplicationProcess
range: DeduplicationMethod
required: true
dedup_tools:
name: dedup_tools
description: External tools used for deduplication
from_schema: https://open-and-sustainable.github.io/revaise-model/schema
rank: 1000
owner: DeduplicationProcess
domain_of:
- DeduplicationProcess
range: ExternalTool
multivalued: true
dedup_criteria:
name: dedup_criteria
description: Criteria used to identify duplicates (e.g., DOI, title similarity)
from_schema: https://open-and-sustainable.github.io/revaise-model/schema
rank: 1000
owner: DeduplicationProcess
domain_of:
- DeduplicationProcess
range: string
multivalued: true
duplicate_groups:
name: duplicate_groups
description: Groups of identified duplicate records
from_schema: https://open-and-sustainable.github.io/revaise-model/schema
rank: 1000
owner: DeduplicationProcess
domain_of:
- DeduplicationProcess
range: DuplicateGroup
multivalued: true
dedup_started_at:
name: dedup_started_at
description: When deduplication started
from_schema: https://open-and-sustainable.github.io/revaise-model/schema
rank: 1000
owner: DeduplicationProcess
domain_of:
- DeduplicationProcess
range: datetime
dedup_completed_at:
name: dedup_completed_at
description: When deduplication completed
from_schema: https://open-and-sustainable.github.io/revaise-model/schema
rank: 1000
owner: DeduplicationProcess
domain_of:
- DeduplicationProcess
range: datetime
dedup_performed_by:
name: dedup_performed_by
description: Who performed the deduplication
from_schema: https://open-and-sustainable.github.io/revaise-model/schema
rank: 1000
owner: DeduplicationProcess
domain_of:
- DeduplicationProcess
range: Author
multivalued: true
dedup_notes:
name: dedup_notes
description: Additional notes about the deduplication process
from_schema: https://open-and-sustainable.github.io/revaise-model/schema
rank: 1000
owner: DeduplicationProcess
domain_of:
- DeduplicationProcess
range: string