Find Duplicate Includes
Scans AsciiDoc files for include:: macros and identifies files that are included from multiple locations. This helps you find opportunities for content reuse or identify potential maintenance issues where the same content is being pulled into multiple places.
What This Tool Detects
The tool finds files that are referenced by include:: macros in more than one location:
// In getting-started.adoc
include::modules/common-prereqs.adoc[]
// In troubleshooting.adoc (same file included again)
include::modules/common-prereqs.adoc[]
Common Files
By default, the tool excludes “common” files that are expected to be included multiple times:
attributes.adoc_attributes.adoccommon/attributes.adoccommon/revision-info.adoc
Use --include-common to include these files in the results.
Installation
After installing the package, run the tool from anywhere:
find-duplicate-includes [directory] [options]
Or run from source:
python3 find_duplicate_includes.py [directory] [options]
Options
| Option | Description |
|---|---|
directory | Directory to scan (default: current directory) |
--include-common | Include common files (attributes.adoc, etc.) in results |
-e, --exclude-dir DIR | Directory to exclude (can be repeated) |
--exclude-file FILE | File to exclude (can be repeated) |
--no-output | Do not write report file (stdout only) |
--format FORMAT | Output format: txt (default), csv, json, md |
Examples
# Scan current directory
find-duplicate-includes
# Scan a specific directory
find-duplicate-includes ./docs
# Include common files in results
find-duplicate-includes --include-common
# Exclude specific directories
find-duplicate-includes -e archive -e drafts
# Generate CSV report for spreadsheet analysis
find-duplicate-includes --format csv
# Display only, no report file
find-duplicate-includes --no-output
Output
By default, the tool writes a report to ./reports/duplicate-includes_YYYY-MM-DD_HH-MM-SS.txt. Use --format to change the output format or --no-output to skip file generation.
Sample Text Report
Command: find-duplicate-includes
Directory: /home/user/my-docs
Files scanned: 142
Found 3 files included more than once:
(2 common files excluded; use --include-common to see all)
======================================================================
[1] modules/common-prereqs.adoc
Included 4 times:
--------------------------------------------------
- assemblies/getting-started.adoc:12
- assemblies/installation-guide.adoc:8
- assemblies/troubleshooting.adoc:15
- assemblies/quickstart.adoc:6
[2] modules/support-statement.adoc
Included 2 times:
--------------------------------------------------
- assemblies/overview.adoc:45
- assemblies/release-notes.adoc:78
CSV Format
Included File,Inclusion Count,Is Common,Source File,Line Number,Raw Include Path
"modules/common-prereqs.adoc",4,False,"assemblies/getting-started.adoc",12,"../modules/common-prereqs.adoc"
"modules/common-prereqs.adoc",4,False,"assemblies/installation-guide.adoc",8,"../modules/common-prereqs.adoc"
Use Cases
Identify Refactoring Opportunities
Files included multiple times may be candidates for:
- Shared modules: Move frequently-included content to a dedicated shared location
- Conditional includes: Use
ifdef/ifndefto include content based on context - Attributes: Extract repeated short content into attributes
Audit Content Reuse
Track which files are reused across your documentation to:
- Understand documentation architecture
- Identify high-impact files (changes affect multiple outputs)
- Plan content updates that touch shared resources
Detect Unintentional Duplication
Sometimes the same content gets included multiple times by accident, leading to:
- Duplicated content in output
- Maintenance burden when updates are needed
- Build performance issues
Differences from find-duplicate-content
| Feature | find-duplicate-includes | find-duplicate-content |
|---|---|---|
| What it finds | Files included via include:: | Similar text blocks |
| Matching | Exact file references | Similarity-based |
| Scope | Include macros only | Notes, tables, steps, code |
| Use case | Audit file reuse | Find copy-pasted content |
Use both tools together for comprehensive content analysis.
Limitations
- Only detects
include::macros at the start of lines - Does not resolve attribute references in include paths
- Does not detect includes within ifdef/ifndef blocks differently
Notes
- Only
.adocfiles are scanned - Symlinks are ignored
- Default excluded directories:
.git,.archive,target,build,node_modules - The tool does not modify any files
See the main README.md for installation and usage details.