Basic SharePoint 2007 Search Configurations: Minimizing Search Visibility

 

This is intended as a short post

I thought I’d put together some basic search tweaks and configurations originating from client requests. Nothing all that original but perhaps helpful as a quick reference. I’ll focus on common tweaks from the perspective for restricting search visibility and access. This is not a comprehensive post on the topic and reflects my opinions as well as some standard techniques.

The following is a “quick start” list of tips regarding this topic but more advanced configurations go beyond those listed here.

  1. Designate a home for sensitive documents (configuring well-known’s are better than configuring for unknowns) given the shared aspects of Search.
  2. The administrative Search account should have only minimal permissions (SharePoint read permissions are granted by default). Elevated permissions will potentially cause searches to index old document versions and unpublished materials (if say administrative rights are assigned – not good)
  3. Prevent sensitive items from being indexed using some of the techniques found in this blog post.
  4. Do not use fine-grained (customized) permissions for accessing the document, if possible.
  5. Adding a sensitive site collection to managed path designated with explicit inclusion creates a visibly unique path (ex: not on a wildcard path) – which could help prevent fat-fingering of exclusion rules in Search settings – see next tip.
  6. Add crawl rules that exclude content (using central administration) – see example in this post.
  7. Keep in mind that lists and libraries expose their contents using ASPX pages and those pages could potentially be indexed as well – see examples in this post
  8. Search scopes are about indexing content (defined and managed locally on the site, or at the farm level) – if content is not indexed then it will never show up in a scope. This is a good thing for eliminating search discovery of sensitive content.
  9. Be aware that best bets are not security trimmed within search – not so good    

I said recently that minimizing visibility in SharePoint search works by “exclusion” – this is not totally accurate but gets to the point when thinking about hiding objects from search results.

 

Restricting Search Access to Content at the site level

From your sites home page move into Home > Site Settings > Search Visibility 

The following screen appears

search1

The descriptions can be confusing, so be sure you have a clear understanding the difference between “site content” and “page content” as used here. The first question pertains to the site itself showing up in Search results, an answer of “Indexing Site Content” of “No” would prevent that situation.

In addition, the items displayed within the sites ASPX pages (text, images, the contents of web parts) are dependent on the answer to the second question “Indexing APSX Page content”. Most of the time you keep the defaults. However, keep in mind that pages used to display a library will need to be secured even if the library is secured already (see next section) so you would also need to set the “Indexing ASPX Page Content” to “Never index any APSX pages on this site” to cover the entire site. 

Preventing the indexing of individual ASPX pages can be enabled per page using the following tag (use this option for more granular visibility). The rest of the site may still be indexed.

<META NAME="ROBOTS" CONTENT="NOHTMLINDEX"/> – see http://technet.microsoft.com/en-us/library/cc287898.aspx This is done in SharePoint Designer 2007 if desired.

Indexing and search visibility are very different from setting item permissions – which represent access to the items themselves. The permission levels can be set for  sites, lists, libraries, or at the item level (using groups/custom/item level permissions). But keep things simple – try and collect secure documentation in a few places and understand who has permissions and what needs to be visible in search. Also keep in mind that once an item is indexed its potential to disclose private information is greatly enhanced. Documents are made secure following some simple rules – but these rules do not exclude careful analysis of your use case. Mileage may vary.

If you decide to create your own permission levels, “fine-grained” permissions, then you may want to set “Do not index APSX pages if site contains fine-grained permissions”, the default. This detects if custom permissions are later set on the site and will not index ASPX pages within the site. This kind of permission level is often indicative of sensitive information.

 

Restricting Search Access to Document Libraries and Lists at the site level

From your sites home page move to Home > Documents > Settings > Advanced Settings

If users do not have any permissions to your site then access to documents are not permitted but may show up in Search results unless the option to allow this is set as shown below.

search2

Set “Allow items from the document library to appear in search results results?” to “No. The same is true for lists as well as document libraries.

 

Farm levels settings for excluding only a portion of a web site 

Navigate to the following path Shared Services Administration: SharedServices1 > Search Administration > Crawl rules

 search3

You can add crawl rules to include or exclude content, for example if a site collection is http://toplevel and contains a child site http://toplevel/sensitive

You can exclude http://toplevel/sensitive by adding this rule as follows.

search4

This crawl rule MUST execute prior to other rules that include content on the same site. That is, once a rule is fired that’s it. Farm level Search configurations go well beyond a basic crawl rule. However, this is the essential action for restricting content indexing. I’ll add more later and perhaps get into looking at search tweaks for SP 2010.

Cheers!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s