Skip to content

Class: DuplicateGroup

Group of records identified as duplicates

URI: revaise:DuplicateGroup

```mermaid classDiagram class DuplicateGroup click DuplicateGroup href "../DuplicateGroup/" DuplicateGroup : duplicate_record_ids

  DuplicateGroup : group_id

  DuplicateGroup : master_record_id

  DuplicateGroup : match_fields

  DuplicateGroup : resolution_method





    DuplicateGroup --> "0..1" ResolutionMethod : resolution_method
    click ResolutionMethod href "../ResolutionMethod/"



  DuplicateGroup : similarity_score

```

Slots

Name Cardinality and Range Description Inheritance
group_id 1
String
Unique identifier for the duplicate group direct
master_record_id 1
String
ID of the record kept as the master direct
duplicate_record_ids 1..*
String
IDs of records identified as duplicates direct
similarity_score 0..1
Float
Similarity score between duplicates (0-1) direct
match_fields *
String
Fields that matched between duplicates direct
resolution_method 0..1
ResolutionMethod
Method used to resolve the conflict direct

Usages

used by used in type used
DeduplicationProcess duplicate_groups range DuplicateGroup

Identifier and Mapping Information

Schema Source

  • from schema: https://open-and-sustainable.github.io/revaise-model/schema/stages/screening

Mappings

Mapping Type Mapped Value
self revaise:DuplicateGroup
native revaise:DuplicateGroup

LinkML Source

Direct

name: DuplicateGroup
description: Group of records identified as duplicates
from_schema: https://open-and-sustainable.github.io/revaise-model/schema/stages/screening
slots:
- group_id
- master_record_id
- duplicate_record_ids
- similarity_score
- match_fields
- resolution_method
slot_usage:
  group_id:
    name: group_id
    description: Unique identifier for the duplicate group
    identifier: true
    range: string
  master_record_id:
    name: master_record_id
    description: ID of the record kept as the master
    range: string
    required: true
  duplicate_record_ids:
    name: duplicate_record_ids
    description: IDs of records identified as duplicates
    range: string
    required: true
    multivalued: true
  similarity_score:
    name: similarity_score
    description: Similarity score between duplicates (0-1)
    range: float
  match_fields:
    name: match_fields
    description: Fields that matched between duplicates
    range: string
    multivalued: true
  duplicate_resolution:
    name: duplicate_resolution
    description: How the duplicate was resolved
    range: string

Induced

name: DuplicateGroup
description: Group of records identified as duplicates
from_schema: https://open-and-sustainable.github.io/revaise-model/schema/stages/screening
slot_usage:
  group_id:
    name: group_id
    description: Unique identifier for the duplicate group
    identifier: true
    range: string
  master_record_id:
    name: master_record_id
    description: ID of the record kept as the master
    range: string
    required: true
  duplicate_record_ids:
    name: duplicate_record_ids
    description: IDs of records identified as duplicates
    range: string
    required: true
    multivalued: true
  similarity_score:
    name: similarity_score
    description: Similarity score between duplicates (0-1)
    range: float
  match_fields:
    name: match_fields
    description: Fields that matched between duplicates
    range: string
    multivalued: true
  duplicate_resolution:
    name: duplicate_resolution
    description: How the duplicate was resolved
    range: string
attributes:
  group_id:
    name: group_id
    description: Unique identifier for the duplicate group
    from_schema: https://open-and-sustainable.github.io/revaise-model/schema/stages/screening
    rank: 1000
    identifier: true
    alias: group_id
    owner: DuplicateGroup
    domain_of:
    - DuplicateGroup
    range: string
  master_record_id:
    name: master_record_id
    description: ID of the record kept as the master
    from_schema: https://open-and-sustainable.github.io/revaise-model/schema/stages/screening
    rank: 1000
    alias: master_record_id
    owner: DuplicateGroup
    domain_of:
    - DuplicateGroup
    range: string
    required: true
  duplicate_record_ids:
    name: duplicate_record_ids
    description: IDs of records identified as duplicates
    from_schema: https://open-and-sustainable.github.io/revaise-model/schema/stages/screening
    rank: 1000
    alias: duplicate_record_ids
    owner: DuplicateGroup
    domain_of:
    - DuplicateGroup
    range: string
    required: true
    multivalued: true
  similarity_score:
    name: similarity_score
    description: Similarity score between duplicates (0-1)
    from_schema: https://open-and-sustainable.github.io/revaise-model/schema/stages/screening
    rank: 1000
    alias: similarity_score
    owner: DuplicateGroup
    domain_of:
    - DuplicateGroup
    range: float
  match_fields:
    name: match_fields
    description: Fields that matched between duplicates
    from_schema: https://open-and-sustainable.github.io/revaise-model/schema/stages/screening
    rank: 1000
    alias: match_fields
    owner: DuplicateGroup
    domain_of:
    - DuplicateGroup
    range: string
    multivalued: true
  resolution_method:
    name: resolution_method
    description: Method used to resolve the conflict
    from_schema: https://open-and-sustainable.github.io/revaise-model/schema/stages/screening
    rank: 1000
    alias: resolution_method
    owner: DuplicateGroup
    domain_of:
    - DuplicateGroup
    - ConflictResolution
    range: ResolutionMethod