This is intended as a short post
I thought I’d put together some basic search tweaks and configurations originating from client requests. Nothing all that original but perhaps helpful as a quick reference. I’ll focus on common tweaks from the perspective for restricting search visibility and access. This is not a comprehensive post on the topic and reflects my opinions as well as some standard techniques.
The following is a “quick start” list of tips regarding this topic but more advanced configurations go beyond those listed here.
- Designate a home for sensitive documents (configuring well-known’s are better than configuring for unknowns) given the shared aspects of Search.
- The administrative Search account should have only minimal permissions (SharePoint read permissions are granted by default). Elevated permissions will potentially cause searches to index old document versions and unpublished materials (if say administrative rights are assigned – not good)
- Prevent sensitive items from being indexed using some of the techniques found in this blog post.
- Do not use fine-grained (customized) permissions for accessing the document, if possible.
- Adding a sensitive site collection to managed path designated with explicit inclusion creates a visibly unique path (ex: not on a wildcard path) – which could help prevent fat-fingering of exclusion rules in Search settings – see next tip.
- Add crawl rules that exclude content (using central administration) – see example in this post.
- Keep in mind that lists and libraries expose their contents using ASPX pages and those pages could potentially be indexed as well – see examples in this post
- Search scopes are about indexing content (defined and managed locally on the site, or at the farm level) – if content is not indexed then it will never show up in a scope. This is a good thing for eliminating search discovery of sensitive content.
- Be aware that best bets are not security trimmed within search – not so good
I said recently that minimizing visibility in SharePoint search works by “exclusion” – this is not totally accurate but gets to the point when thinking about hiding objects from search results.
Restricting Search Access to Content at the site level
From your sites home page move into Home > Site Settings > Search Visibility
The following screen appears
The descriptions can be confusing, so be sure you have a clear understanding the difference between “site content” and “page content” as used here. The first question pertains to the site itself showing up in Search results, an answer of “Indexing Site Content” of “No” would prevent that situation.
In addition, the items displayed within the sites ASPX pages (text, images, the contents of web parts) are dependent on the answer to the second question “Indexing APSX Page content”. Most of the time you keep the defaults. However, keep in mind that pages used to display a library will need to be secured even if the library is secured already (see next section) so you would also need to set the “Indexing ASPX Page Content” to “Never index any APSX pages on this site” to cover the entire site.
Preventing the indexing of individual ASPX pages can be enabled per page using the following tag (use this option for more granular visibility). The rest of the site may still be indexed.
<META NAME="ROBOTS" CONTENT="NOHTMLINDEX"/> – see http://technet.microsoft.com/en-us/library/cc287898.aspx This is done in SharePoint Designer 2007 if desired.
Indexing and search visibility are very different from setting item permissions – which represent access to the items themselves. The permission levels can be set for sites, lists, libraries, or at the item level (using groups/custom/item level permissions). But keep things simple – try and collect secure documentation in a few places and understand who has permissions and what needs to be visible in search. Also keep in mind that once an item is indexed its potential to disclose private information is greatly enhanced. Documents are made secure following some simple rules – but these rules do not exclude careful analysis of your use case. Mileage may vary.
If you decide to create your own permission levels, “fine-grained” permissions, then you may want to set “Do not index APSX pages if site contains fine-grained permissions”, the default. This detects if custom permissions are later set on the site and will not index ASPX pages within the site. This kind of permission level is often indicative of sensitive information.
Restricting Search Access to Document Libraries and Lists at the site level
From your sites home page move to Home > Documents > Settings > Advanced Settings
If users do not have any permissions to your site then access to documents are not permitted but may show up in Search results unless the option to allow this is set as shown below.
Set “Allow items from the document library to appear in search results results?” to “No. The same is true for lists as well as document libraries.
Farm levels settings for excluding only a portion of a web site
Navigate to the following path Shared Services Administration: SharedServices1 > Search Administration > Crawl rules
You can exclude http://toplevel/sensitive by adding this rule as follows.
This crawl rule MUST execute prior to other rules that include content on the same site. That is, once a rule is fired that’s it. Farm level Search configurations go well beyond a basic crawl rule. However, this is the essential action for restricting content indexing. I’ll add more later and perhaps get into looking at search tweaks for SP 2010.