How Data & More uses false positives and false negatives to improve the classification
Data classification
The Data & More data classification team continuously works to improve privacy classification. It's crucial to accurately identify and classify all types of privacy or security data while minimizing the number of misclassified items. To support this goal, Data & More has established the following methods for managing two types of misclassified data: false positives and false negatives
How to manage: False positives
Definition:
Files that Data & More has classified as privacy or security data but are not actually privacy or security data — i.e., “misclassified”.
Process:
- In the D&M cleanup process, users can mark a classified file as “misclassified.”
Marking a classified file as misclassified means it is a false positive. - These “misclassified” files are excluded from the cleanup report. During maintenance, the Data & More team will review these files.
- Note: The data is not moved or copied from the installation. The D&M team only reviews the data to improve the general classification logic.
How to manage: False negatives
Definition:
Files that Data & More has not classified as privacy or security data but should have been classified — i.e., “misclassified.”
Process:
- False negatives are harder to detect because they were not initially flagged. These are files that should have been identified/classified but were not — called false negatives.
- Create a folder in the repository.
- Copy samples of such data into that folder.
- Name the folder: DM false negatives.
- To help identify false negatives without copying data out of the system, users should:
- The Data & More team will review this data during maintenance and use it to enhance the overall classification process.