Class: DuplicateGroup¶
Group of records identified as duplicates
```mermaid classDiagram class DuplicateGroup click DuplicateGroup href "../DuplicateGroup/" DuplicateGroup : duplicate_record_ids
DuplicateGroup : group_id
DuplicateGroup : master_record_id
DuplicateGroup : match_fields
DuplicateGroup : resolution_method
DuplicateGroup --> "0..1" ResolutionMethod : resolution_method
click ResolutionMethod href "../ResolutionMethod/"
DuplicateGroup : similarity_score
```
Slots¶
| Name | Cardinality and Range | Description | Inheritance |
|---|---|---|---|
| group_id | 1 String |
Unique identifier for the duplicate group | direct |
| master_record_id | 1 String |
ID of the record kept as the master | direct |
| duplicate_record_ids | 1..* String |
IDs of records identified as duplicates | direct |
| similarity_score | 0..1 Float |
Similarity score between duplicates (0-1) | direct |
| match_fields | * String |
Fields that matched between duplicates | direct |
| resolution_method | 0..1 ResolutionMethod |
Method used to resolve the conflict | direct |
Usages¶
| used by | used in | type | used |
|---|---|---|---|
| DeduplicationProcess | duplicate_groups | range | DuplicateGroup |
Identifier and Mapping Information¶
Schema Source¶
- from schema: https://open-and-sustainable.github.io/revaise-model/schema/stages/screening
Mappings¶
| Mapping Type | Mapped Value |
|---|---|
| self | revaise:DuplicateGroup |
| native | revaise:DuplicateGroup |
LinkML Source¶
Direct¶
name: DuplicateGroup
description: Group of records identified as duplicates
from_schema: https://open-and-sustainable.github.io/revaise-model/schema/stages/screening
slots:
- group_id
- master_record_id
- duplicate_record_ids
- similarity_score
- match_fields
- resolution_method
slot_usage:
group_id:
name: group_id
description: Unique identifier for the duplicate group
identifier: true
range: string
master_record_id:
name: master_record_id
description: ID of the record kept as the master
range: string
required: true
duplicate_record_ids:
name: duplicate_record_ids
description: IDs of records identified as duplicates
range: string
required: true
multivalued: true
similarity_score:
name: similarity_score
description: Similarity score between duplicates (0-1)
range: float
match_fields:
name: match_fields
description: Fields that matched between duplicates
range: string
multivalued: true
duplicate_resolution:
name: duplicate_resolution
description: How the duplicate was resolved
range: string
Induced¶
name: DuplicateGroup
description: Group of records identified as duplicates
from_schema: https://open-and-sustainable.github.io/revaise-model/schema/stages/screening
slot_usage:
group_id:
name: group_id
description: Unique identifier for the duplicate group
identifier: true
range: string
master_record_id:
name: master_record_id
description: ID of the record kept as the master
range: string
required: true
duplicate_record_ids:
name: duplicate_record_ids
description: IDs of records identified as duplicates
range: string
required: true
multivalued: true
similarity_score:
name: similarity_score
description: Similarity score between duplicates (0-1)
range: float
match_fields:
name: match_fields
description: Fields that matched between duplicates
range: string
multivalued: true
duplicate_resolution:
name: duplicate_resolution
description: How the duplicate was resolved
range: string
attributes:
group_id:
name: group_id
description: Unique identifier for the duplicate group
from_schema: https://open-and-sustainable.github.io/revaise-model/schema/stages/screening
rank: 1000
identifier: true
owner: DuplicateGroup
domain_of:
- DuplicateGroup
range: string
required: true
master_record_id:
name: master_record_id
description: ID of the record kept as the master
from_schema: https://open-and-sustainable.github.io/revaise-model/schema/stages/screening
rank: 1000
owner: DuplicateGroup
domain_of:
- DuplicateGroup
range: string
required: true
duplicate_record_ids:
name: duplicate_record_ids
description: IDs of records identified as duplicates
from_schema: https://open-and-sustainable.github.io/revaise-model/schema/stages/screening
rank: 1000
owner: DuplicateGroup
domain_of:
- DuplicateGroup
range: string
required: true
multivalued: true
similarity_score:
name: similarity_score
description: Similarity score between duplicates (0-1)
from_schema: https://open-and-sustainable.github.io/revaise-model/schema/stages/screening
rank: 1000
owner: DuplicateGroup
domain_of:
- DuplicateGroup
range: float
match_fields:
name: match_fields
description: Fields that matched between duplicates
from_schema: https://open-and-sustainable.github.io/revaise-model/schema/stages/screening
rank: 1000
owner: DuplicateGroup
domain_of:
- DuplicateGroup
range: string
multivalued: true
resolution_method:
name: resolution_method
description: Method used to resolve the conflict
from_schema: https://open-and-sustainable.github.io/revaise-model/schema/stages/screening
rank: 1000
owner: DuplicateGroup
domain_of:
- DuplicateGroup
- ConflictResolution
range: ResolutionMethod