Tuesday, June 25, 2013

Thoughts about Building Multilingual Publishing Site on SharePoint 2013 - Part 3 of 3

This is a third part of the series describing my experience in planning and building a multilingual WCM site on SharePoint 2013.

Part 1 discusses basic requirements and architecture of the authoring site.
Part 2  focuses on managed navigation for the authoring site.
This part discusses Cross-Site Publishing (XSP) and publishing sites.

A Quick Introduction to Cross-Site Publishing

Microsoft has released a blog series describing in detail how to set up a fictitious Contoso site leveraging XSP. Contoso is a product-centric web site using a concept of category pages and catalog item pages to illustrate the applicability of the XSP. A product sold at Contoso electronic store can be categorized by a hierarchy of categories. For example, a specific product, the "Datum SLR Camera X142", is categorized as Cameras >> Digital Cameras. There are only  two kinds of pages that we need here: a product catalog item page showing product information, and a category page showing products matching the category. So, if you pick "Cameras" category from the navigation menu, you will see products qualifying as cameras; if you pick "Digital Cameras" - you will see a narrower set of products qualifying as digital cameras. Regardless of which category in the hierarchy you pick the principal is the same. So we need to figure the current category and render matching products for it. Category pages do exactly that. Next, you click on a specific product listed by the current view of the category page, and then the product details are rendered by the catalog item page, which accepts a unique product identifier. And so you can surface the entire product database on the SharePoint site by using just two pages - a category page and catalog item page.

There are two "magic ingredients" here. Firstly, there is the ability to publish and consume lists as catalogs. What this does is it creates a search result source in the consuming site, and optionally pins the terms from the term set used to categorize the catalog as navigation terms to the navigation term set of the site consuming the catalog. Also behind the scenes, the Microsoft.SharePoint.Publishing.Navigation.TaxonomyNavigation class callable from SharePoint.Publishing.HttpPublishingModule class "becomes aware" of how to detect requested URLs constructed of categories and unique identifiers aka "Friendly URLs" on this site, and route them to appropriate category or catalog item pages. Secondly, it is that the pages "can be made aware" of their current context and use this knowledge when issuing search queries for catalog items. This ability is there thanks to a set of well-known search query variables available to developers or information workers placing search web parts on pages.

Cross-Site Publishing as a Web Content Management Approach

Things started to look a bit confusing when I tried to apply the XSP concept to the site publication scenario that I have described in Part 1. Here are the hurdles I faced:

1. A web site page is hard to fit in the product-centric model described above, because a page can often be both a category and a catalog item at the same time. Commonly sites use section landing pages, the ones the top navigation menus point at, which contain content and at the same time act as logical parents to child pages down the site map hierarchy. Let's say we make a landing page to be an XSP catalog category page. Then, according to the product-centric model we access its child pages as we would access catalog items  - by clicking on a list of these pages rendered by our landing page. Well, this is usually not how information architects expect users to navigate their web sites, unless they are online product stores. What we often need instead is navigation menus, the familiar top navigation and current navigation controls to let us access all pages on the site.

2. Now think for a second about the managed navigation discussed in Part 2 and the requirement we had about maintaining the fidelity between authoring and publishing sites. Every page on the authoring site has a corresponding term used for navigation and forming a friendly URL. Because we use the managed navigation on the authoring site, we get our top and current navigation menus populated by these terms. We want the same behavior on the publishing site. This brings the question: "Which pages should be the category pages, and which ones should be the catalog item pages?" If we were to follow the classical product-centric approach, and designate pages corresponding to the top-level nodes as category pages, and leaf pages as item pages, we would lose the drop-down menus on the publishing site:



When I say "designate" I mean that we "tag" the pages with the terms or, in other words, assign term values to corresponding pages by setting managed metadata fields of these pages.

3. On another extreme, if we tag each and every page on the authoring site with a term, then when we consume the pages library catalog, all our pages will logically become the category pages. How do we now render the catalog item information?

We resolved the confusion by tagging all the pages on the authoring site, effectively making all of them to become the category pages on the publishing site, and modified their page layouts to make them simultaneously act as catalog item pages. This looked like a promising strategy, because by making all of our pages into the category pages, we would get exactly the same top and current navigation elements as on the authoring site, for free. The only thing left to do was to make the category pages render catalog item information - should be doable on a search-driven site.

If you examine an automatically created content item page, you will see that it is based on a page layout, which in turn leverages Catalog Item Reuse (CIR) web parts. Each CIR web part renders contents of a specific managed search property specified in the SelectedPropertiesJSON property. Provided that the managed properties correspond to the columns on the authoring site, this results in rendering the same content on the publishing page as on the authoring page. Below is an example of a CIR web part rendering a value of a managed property corresponding to the Page Content column.


Note that the value of UseSharedDataProvider property should be set to True on all CIR web parts on the page except for one, which serves as a data provider for the rest of them. All CIR web parts on a page form what’s called a query group, with one CIR web part acting as a data provider, and the rest of them – as data consumers. The data provider CIR web part in addition to SelectedPropertiesJSON property has DataProviderJSON property set as shown in the following example.



The value of DataProviderJSON property sets properties on objects of DataProviderScriptWebPart type. The key properties here are:
  • QueryTemplate – defines keyword search filtering criterion using a query variable such as {URLTOKEN.1} in the above example.
  • SourceID – a unique ID of the search result source corresponding to the catalog being consumed. The search result source is created automatically when the catalog is connected to.
  • Scope – a URL of the catalog source list.
An easy way to get started with the CIR web parts is to let SharePoint auto-generate catalog Category and Item pages and page layouts when connecting to a catalog, then harvest the web part markup from the auto-generated page layouts.

So conceptually the problem is solved now: we can create our own page layout and then a category page based on it, and configure CIR web parts to select the managed properties of interest from the catalog.  If the markup of the page layout and master page is the same as on the authoring site, then CIR web parts essentially replace the content fields, and the publishing page appears visually identical to its authoring counterpart, and the navigation is  working, except that only a single page exists on the publishing site. Pretty cool.

Now let's consider some practical aspects of getting the XSP-based publishing site up and running.

XSP Navigation Term Set and Vanity Names

To properly publish pages library as a catalog we need to designate a field uniquely identifying each page, and a managed metadata field used for page categorization. We have come up with two fields for this purpose and defined them at the site collection level in order to make sure the required managed properties get created automatically when we do a full crawl:
  • Vanity Name - this is a text field where we enter a unique friendly name of each page.
  • XSP Category  - is a managed metadata field using a new term set named XSP Navigation.
On the authoring site we are creating and managing a master term set, which is applied to the source variation, then its terms are getting re-used and translated on the target variations. We certainly want to avoid duplication of the effort required to manage the terms when we tag our pages, but we cannot reuse the existing master term set, because SharePoint complains about it being already used when we consume the catalog. We need a new term set, and so we just pin top-level terms with children from the master Site Navigation Term set. Another important thing we need to do is to create a Root term in the new XSP Navigation term set so that we could hook up to it when we consume the catalog. This is illustrated on the figure below:


The convention we have used was that the  value of the Vanity Name field must match the value of the XSP Category Node for any given page on the site. This is important because it allows us to structure the query we issue from the data provider CIR web part as follows: 

VanityNameOWSTEXT:{URLTOKEN.1}

This query means "select items where the value of the Vanity Name managed property contains the last segment of the current URL". So for the URL http://www.softforte.com/softforte-corporate/about-us the {URLTOKEN.1} == "about-us", and due to the convention the Vanity Name == "about-us" as well. The page is a category page in the XSP terms, and has a managed term created when we were consuming the catalog, which points at the URL /softforte-corporate/about-us. This is exactly how we have hacked the category pages, giving them the ability to act as catalog item pages at the same time.


Navigation Translation on Publishing Sites

The terms used on the publishing sites are already translated by virtue of setting their labels for each culture, and customizing term-driven page settings down the re-use chain originating from the authoring source variation's navigation term set. The challenge however exists with regards to selecting the proper translation for the corresponding publishing site.

In order to localize content and navigation of a variations-based site you do not need to install a language pack. The locale such as fr-CA is selected when a variation label is created. This is different for a publishing site, which does not rely on variations. The only way I found to get terms to translate to French was to install the French language pack, then create a site collection using the French target language template.  In order to still manage the site in English, the Alternate Language can be set under Site Settings >> Language Settings. This forces the SharePoint to read the language preferences the browser is sending to it. So if my preferred language is English, I will see the the managed navigation in English, if it is French then it will automatically use the French term translation if one is available. The limitation here is the language packs - we can only publish in the languages supported by language packs, at least to the best of my knowledge.

Do I Really Need a Single Page on My Publishing Site? 

Since most of the real-life sites use multiple page layouts, the same page layouts need to be mirrored  over to the publishing sites and CIR web parts configured there instead of content fields. Then, if all the content on the authoring site is static, i.e. there are no web parts, just the content fields, then we can simply create catalog category pages, one page per page layout, and that would be all we need to do.

Real web sites use web parts to render dynamic content. Since web parts are not stored in index, cross-site publishing will not render them. This means that the web parts need to be re-created on the publishing site. Needless to say that since publishing and authoring sites have different security requirements, the web parts would need to be created differently on publishing and authoring sites.

Where ever possible, we should therefore utilize Content Search (CS)web parts, which would get the content from the SharePoint index just as the CIR web parts do. We were able to meet 100% of requirements to the dynamic elements on the site using CS web parts, thanks to their great flexibility. When a CS web part is used on an authoring site page, it runs in a different context than when it is
used on a publishing site. Therefore  it needs to be adapted to the publishing site, and simply copying it over won't work. Most of the time the adaptation is quite simple however, as the same managed properties are available on both sites, and therefore the search queries are similar.

How big of an issue is the fact that the dynamic elements on content pages need to be recreated on a publishing site? We found it to be quite manageable in our case. The level of work duplication was minor for us, although it depends on how many dynamic web parts are you planning for. A general corollary here is you need to plan thoroughly your pages in advance, to get the most out of the XSP. 

So while you could only use a single category page to render all pages based on a specific page layout, once you add dynamic web parts to the mix you need to create a new page in the publishing site for each corresponding authoring page with the web part. If the same web part is used on multiple pages, then it makes sense to embed it into publishing page layout directly in order to reduce the number of publishing pages you need to create.

Here is an example of how an authoring page corresponds with a  publishing page layout and a publishing page:


On the above illustration, the legend is as follows:

  • Purple color - content fields;
  • Gray color - items inherited from a master page or page layout;
  • Green color - web parts;
  • Blue color - navigation control.

The Inconvenient Result Source ID Value

In the code snippet above the DataProviderJSON property included a SourceID parameter, which is the ID of a Search Result Source that is automatically configured when a catalog is consumed on the site. This property controls the scope of search queries issued by the web parts participating in the query group by adding filtering to select items from the list  with a specific ID and source web site URL. If you let the SharePoint create catalog item and category pages automatically when connecting to a catalog, this Source ID would be embedded in the page layout. Pretty inconvenient, especially as each time you re-connect the Source ID changes, and you cannot control it by exporting and importing site search configuration.

After researching alternatives, we ended up using a provisioning script, which would first provision the publishing page layouts, then determine the result source ID, then check out the layouts, replace the ID in them, and check them back in... like I've done in the old days...

The Big Picture

Getting back to our business requirements described in Part 1, by utilizing the XSP approach we have created 3 publishing sites, one for each variation. The publishing site corresponding to the source variation is not accessible to the Internet users, but the information workers can preview English-level content on it exactly how it would appear to anonymous visitors on the live site. This architecture required us to turn Pages libraries on each variation label into catalogs, and consume the catalogs from the corresponding publishing sites, meaning that we had to have 3 different setups of the publishing page layouts, pages and catalogs. This may seem like a lot, yet since the pages set up on the publishing sites are not created by information workers, but instead are viewed as a part of the application provisioning exercise, and since they are almost identical across the three publishing sites, it was a quite reasonable decision.

Conclusion

The XSP is new, and it can be difficult to depart from the product-centric model used in Microsoft demos. Practically it all gets down to retrieving content from a search index, something we've been doing for a while with SharePoint. The XSP takes the concept to the next level of manageability and implementation convenience, and does not require us to write compiled code any more to take advantage of it. And of course SharePoint sticks to its traditions: planning is key to properly setting up the publishing sites and minimizing duplication of effort. As a result we get modern-looking dynamic publishing web sites and an enterprise-class content management process.

No comments:

Post a Comment