From 6e85fac421ddc6de75c9d40dfce98d3036f09cd2 Mon Sep 17 00:00:00 2001 From: natehessler Date: Wed, 19 Nov 2025 14:45:24 -0600 Subject: [PATCH 1/5] Update search indexing exclusion criteria https://ampcode.com/threads/T-0390a39a-9c04-441e-8982-7e2ef7b9bf76 Co-authored-by: Amp --- docs/admin/search.mdx | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/docs/admin/search.mdx b/docs/admin/search.mdx index 525c29e78..330ff5f3c 100644 --- a/docs/admin/search.mdx +++ b/docs/admin/search.mdx @@ -67,7 +67,16 @@ will not return any result. ## Indexed search -Sourcegraph indexes the code on the default branch of each repository. This speeds up searches that hit many repositories at once. Not all files in a repository branch are indexed, we skip files that are [larger than 1 MB](#maximum-file-size) and binary files. To view which files are skipped during indexing, visit the repository settings page and click on indexing. +Sourcegraph indexes the code on the default branch of each repository. This speeds up searches that hit many repositories at once. Not all files in a repository branch are indexed. We skip: + +- Files that are [larger than 1 MB](#maximum-file-size). +- Binary files. +- Files exceeding 20,000 unique trigrams (sequences of three characters). +- Files that are not valid UTF-8. + +To view which files are skipped during indexing, visit the repository settings page and click on **Indexing**. + +To force the indexer to include specific files (like `yarn.lock` or other large text files) that are otherwise skipped, add their file path or a glob pattern to the [`search.largeFiles`](/admin/config/site_config#search-largeFiles) setting in your site configuration and reindex the repository. Note that files must still be valid UTF-8 to be indexed, even if added to `search.largeFiles`. For large deployments we recommend horizontally scaling indexed search. You can do this by [adjusting the number of replicas](https://github.com/sourcegraph/deploy-sourcegraph/blob/master/docs/configure#configure-indexed-search-replica-count). Sourcegraph shards repository indexes across replicas. When the replica count changes Sourcegraph will slowly rebalance indexes to ensure availability of existing indexes. From 310917e0ffda46480ebc980898f7e0729f073926 Mon Sep 17 00:00:00 2001 From: natehessler Date: Wed, 19 Nov 2025 14:45:28 -0600 Subject: [PATCH 2/5] Update search indexing exclusion criteria https://ampcode.com/threads/T-0390a39a-9c04-441e-8982-7e2ef7b9bf76 Co-authored-by: Amp From 64d24ad85ef515aee46d3a14c840095e13e5b66f Mon Sep 17 00:00:00 2001 From: natehessler Date: Wed, 19 Nov 2025 15:12:15 -0600 Subject: [PATCH 3/5] Fix link to search.largeFiles setting in documentation Updated link in search documentation for large files setting. --- docs/admin/search.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/admin/search.mdx b/docs/admin/search.mdx index 330ff5f3c..9785e7b5b 100644 --- a/docs/admin/search.mdx +++ b/docs/admin/search.mdx @@ -76,7 +76,7 @@ Sourcegraph indexes the code on the default branch of each repository. This spee To view which files are skipped during indexing, visit the repository settings page and click on **Indexing**. -To force the indexer to include specific files (like `yarn.lock` or other large text files) that are otherwise skipped, add their file path or a glob pattern to the [`search.largeFiles`](/admin/config/site_config#search-largeFiles) setting in your site configuration and reindex the repository. Note that files must still be valid UTF-8 to be indexed, even if added to `search.largeFiles`. +To force the indexer to include specific files (like `yarn.lock` or other large text files) that are otherwise skipped, add their file path or a glob pattern to the [`search.largeFiles`](/admin/config/site_config) setting in your site configuration and reindex the repository. Note that files must still be valid UTF-8 to be indexed, even if added to `search.largeFiles`. For large deployments we recommend horizontally scaling indexed search. You can do this by [adjusting the number of replicas](https://github.com/sourcegraph/deploy-sourcegraph/blob/master/docs/configure#configure-indexed-search-replica-count). Sourcegraph shards repository indexes across replicas. When the replica count changes Sourcegraph will slowly rebalance indexes to ensure availability of existing indexes. From 7737198def5d387808a2923211e73ed900d83d23 Mon Sep 17 00:00:00 2001 From: natehessler Date: Wed, 19 Nov 2025 15:24:41 -0600 Subject: [PATCH 4/5] Remove markdown link formatting in search.mdx --- docs/admin/search.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/admin/search.mdx b/docs/admin/search.mdx index 9785e7b5b..c25c0ec00 100644 --- a/docs/admin/search.mdx +++ b/docs/admin/search.mdx @@ -76,9 +76,9 @@ Sourcegraph indexes the code on the default branch of each repository. This spee To view which files are skipped during indexing, visit the repository settings page and click on **Indexing**. -To force the indexer to include specific files (like `yarn.lock` or other large text files) that are otherwise skipped, add their file path or a glob pattern to the [`search.largeFiles`](/admin/config/site_config) setting in your site configuration and reindex the repository. Note that files must still be valid UTF-8 to be indexed, even if added to `search.largeFiles`. +To force the indexer to include specific files (like `yarn.lock` or other large text files) that are otherwise skipped, add their file path or a glob pattern to the search.largeFiles setting in your site configuration and reindex the repository. Note that files must still be valid UTF-8 to be indexed, even if added to `search.largeFiles`. -For large deployments we recommend horizontally scaling indexed search. You can do this by [adjusting the number of replicas](https://github.com/sourcegraph/deploy-sourcegraph/blob/master/docs/configure#configure-indexed-search-replica-count). Sourcegraph shards repository indexes across replicas. When the replica count changes Sourcegraph will slowly rebalance indexes to ensure availability of existing indexes. +For large deployments we recommend horizontally scaling indexed search. You can do this by adjusting the number of replicas. Sourcegraph shards repository indexes across replicas. When the replica count changes Sourcegraph will slowly rebalance indexes to ensure availability of existing indexes. The resource requirements for indexed search vary considerably based on the text contents of your repositories, but a good estimate is that the node should have enough memory to hold the entire text contents of the default branch of each repository. From b43ab58baf6174935d03d38cf2d016843863c5b7 Mon Sep 17 00:00:00 2001 From: natehessler Date: Wed, 19 Nov 2025 15:32:45 -0600 Subject: [PATCH 5/5] Add link to search.largeFiles setting in documentation Updated the documentation to include a link for the search.largeFiles setting. --- docs/admin/search.mdx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/admin/search.mdx b/docs/admin/search.mdx index c25c0ec00..f88c3afa0 100644 --- a/docs/admin/search.mdx +++ b/docs/admin/search.mdx @@ -76,9 +76,9 @@ Sourcegraph indexes the code on the default branch of each repository. This spee To view which files are skipped during indexing, visit the repository settings page and click on **Indexing**. -To force the indexer to include specific files (like `yarn.lock` or other large text files) that are otherwise skipped, add their file path or a glob pattern to the search.largeFiles setting in your site configuration and reindex the repository. Note that files must still be valid UTF-8 to be indexed, even if added to `search.largeFiles`. +To force the indexer to include specific files (like `yarn.lock` or other large text files) that are otherwise skipped, add their file path or a glob pattern to the [search.largeFiles](https://sourcegraph.com/docs/admin/search#maximum-file-size) setting in your site configuration and reindex the repository. Note that files must still be valid UTF-8 to be indexed, even if added to `search.largeFiles`. -For large deployments we recommend horizontally scaling indexed search. You can do this by adjusting the number of replicas. Sourcegraph shards repository indexes across replicas. When the replica count changes Sourcegraph will slowly rebalance indexes to ensure availability of existing indexes. +For large deployments we recommend horizontally scaling indexed search. You can do this by adjusting the [number of replicas](https://sourcegraph.com/docs/admin/deploy/kubernetes/configure). Sourcegraph shards repository indexes across replicas. When the replica count changes Sourcegraph will slowly rebalance indexes to ensure availability of existing indexes. The resource requirements for indexed search vary considerably based on the text contents of your repositories, but a good estimate is that the node should have enough memory to hold the entire text contents of the default branch of each repository.