Skip to content

Feature: Incremental Scanning with File and App Metadata Cache #10

@gamepop

Description

@gamepop

Summary

Cache file and app metadata to enable incremental scans using the USN Journal, dramatically reducing scan times from 15-45 minutes to 1-3 minutes.

Motivation

Currently, every Deep Scan performs a full filesystem walk and registry query, which is slow:

  • File discovery (500K files): 5-15 minutes
  • App discovery (200 apps): 30-60 seconds
  • Duplicate detection: 10-30 minutes

By caching metadata and using the existing USN Journal reader, we can update incrementally.

Proposed Implementation

New Models

CachedFileInfo

  • Path, FileName, Extension, SizeBytes
  • LastModified, LastAccessed
  • ContentHash (for duplicate detection)
  • FileCategory (Video, Photo, Document, etc.)
  • UsnNumber, CachedAt (for incremental updates)

CachedAppInfo

  • Id (Registry key or Package ID), Name, Publisher, Version
  • Source, TotalSizeBytes, InstallPath
  • LastUsed, Category
  • CachedAt, RegistryHash (to detect changes)

Storage: SQLite Database

  • Fast queries with indexes
  • ~100MB for 500K files
  • B-tree index for hash-based duplicate lookup

New Services

IFileMetadataStore

  • GetFileAsync, UpsertFileAsync, DeleteFileAsync
  • GetFilesByExtensionAsync, GetFilesByHashAsync
  • GetLastUsnAsync, SetLastUsnAsync

IAppMetadataStore

  • GetAllAppsAsync, UpsertAppAsync
  • HasChangedAsync (compare registry hash)

IncrementalFileScanner

  • Integrates with existing UsnJournalReader from SentinelService
  • Queries USN Journal for changes since last scan
  • Updates cache with deltas only

CacheMaintenanceService (Background)

  • Runs every hour to keep cache warm
  • Processes USN journal in small batches
  • Ensures cache is always current

Performance Comparison

Operation Full Scan Incremental Speedup
File discovery 5-15 min 10-30 sec 30-60x
App discovery 30-60 sec 2-5 sec 10-20x
Duplicate detection 10-30 min 1-2 min 10-15x
Total Deep Scan 15-45 min 1-3 min 15-30x

Benefits

  • 15-30x faster subsequent scans
  • Only hash new/changed files for duplicate detection
  • Enable "Quick Scan" option using cache only
  • Background updates keep cache fresh
  • Historical file tracking over time

Tasks

  • Create CachedFileInfo and CachedAppInfo models
  • Implement SQLite-based IFileMetadataStore
  • Implement SQLite-based IAppMetadataStore
  • Create IncrementalFileScanner service
  • Integrate with existing UsnJournalReader
  • Implement CacheMaintenanceService background service
  • Add "Quick Scan" vs "Full Scan" option to UI
  • Add cache statistics to Deep Scan dashboard
  • Create cache invalidation/rebuild mechanism

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions