Monday, June 17, 2013

Thoughts about Building Multilingual Publishing Site on SharePoint 2013 - Part 1 of 3

Publishing approach has drastically changed in SharePoint Server 2013, and the Web Content Management (WCM) solution and infrastructure architecture approaches need to be revised from the ground up as a result. The guidance suggests 2 methods of content publishing: Author-in-place and Cross-site publishing, and provides a decision flowchart for choosing one over the other. Although content deployment publishing method is still supported, it is no longer in the list of recommendations. While the reasons for this are unknown to me, at least I know that the content deployment does not work with the new and sought-after managed navigation feature. With the above in mind, how do you practically approach construction of a multilingual Internet-facing SharePoint-based WCM site utilizing clean URLs and a content publishing process suitable for an enterprise?

This blog post is the first one in a 3-part series dedicated to the aspects of a multilingual WCM site architecture. It sets the context and describes the architecture of the authoring site.
Part 2 describes managed metadata navigation on the authoring site;
Part 3 describes the architecture of publishing sites and utilizing the cross-site publishing.

Setting the Context

At Navantis, we needed to build a bi-lingual Internet presence publishing site for our Canadian customer, based on an on-premises installation of SharePoint Server 2013. The key requirements at a high level were as follows:
  1. The content of the site should be available in French and English;
  2. Although the authors could publish French version of a piece of content independently of its English version, most of the time it was required for the two versions to go live simultaneously, such as in cases of company news announcements;
  3. The site had two kinds of content - "non-volatile", updated rarely, and "volatile" corporate blog content updated monthly when new postings were published;
  4. Some pages in addition to containing content, such as text and images, needed to have dynamic elements pointing at other pages, for example products widget, blog postings widget, home page image slider widget, etc.;
  5. English and French versions of the content would be hosted on two sites addressed by two different DNS host names;
  6. Authors and approvers should be able to preview the content with high fidelity to its production appearance, and follow a translation and approval business process before exposing it to the anonymous Internet users.
Pretty typical, I would think...

General Architecture

So, we have started with planning an application utilizing Cross-Site Publishing (XSP) method, and studying Microsoft guidance including the case study of Mavention WCM site. I must point out that the case study and the excellent blog of Mavention's Waldek Mastykarz were very much helpful to us. Yet our case was not identical to the Mavention's site. The key differences were as follows:
  • We used managed navigation on the authoring site. This was done in order to provide the fidelity of the look and feel between the authoring and publishing sites to the authors and approvers. Here we have willingly deviated from the guidance recommending structured navigation for authoring sites that leverage XSP. Another reason for this choice was that it provided for an easy fallback to the author-in-place publishing method, should there be implementation issues with the XSP method.
  • We were hosting authoring and publishing sites in two separate web applications in order to separate them physically. The customer 's security policy allowed placing both authoring and publishing sites to the perimeter network, and so we did.

Assets Site

Some of the key architecture decisions to make have to do with where to store and how to publish binary assets created by information workers, such as images and videos. Search index does not store binary content. This means that in an authoring-publishing setup the images need to be either shared or copied between authoring and publishing sites. While the images that are a part of the branding can be copied over during site provisioning, this cannot be done with the images used by the information workers when creating content pages.

The two alternatives I see here are developing a solution for synchronizing the images between the authoring and publishing sides, and using a shared assets site. The synchronization solution could theoretically leverage content deployment between source and target site collections for the images, although the information workers would need to maintain "dummy" pages referencing all the used images so that they could be picked up and transferred over by content deployment. We have chosen the same approach as in the Mavention case study - a shared assets site. It is important to note some consequences of this decision:
  • Unless we keep authoring and publishing sites in the same web application and use path-based site collections, we have to use absolute URIs when referencing the assets (starting with protocol identifier, ex. http://www.softforte.com/styles/softforte.jpg). This was our case based on requirement #5 above.
  • Assets site would need to use author-in-place publishing method and allow information workers to have authenticated access to the assets site, while also being available publically to anonymous users.
  • If you use host-named site collections for your publishing sites, which are a general recommendation on the grounds of improved scalability, you further limit the available options: with host-named site collections there is only one security zone, and securing authenticated traffic with SSL becomes problematic.
So we have chosen to use the unsecured HTTP channel with NTLM authentication for the assets site.

Blogs Site
We chose to keep blog postings in a sub-site of the main authoring site because only one Pages library can be used per web site, and we wanted to have a separation between the two libraries in order to be able to manage publishing workflows for pages and blog posts independently of each other. Since we relied on search web parts to surface content from the blogs, it was not a problem to retrieve blogs content regardless of its location. The only thing to be aware of is configuration of navigation on the blogs site - it needed to be inherited from the parent site in order to maintain its navigation context.

Variations and Publishing Process

In order to meet the requirement #2 asking for pages in both languages to be published simultaneously we have taken a rather elegant approach: we have used 3 variations: en-US (source English), en-CA (target English), fr-CA (target French). The source variation is not visible to the Internet users, but still allows to approve and then publish content in English language, an action that triggers propagation of published updates across the variations. Then in case of the en-CA language variation label the content simply gets approved to become published, and in case of fr-CA language it undergoes a translation process then gets approved and published. Such approach lets us publish French and English versions of a page quasi-simultaneously, leaving it up to the approver to determine the interval between approving French and English publications.

Customizations and Search-Driven Content

We have built the site 100% without having to write custom server-side components. All customizations were client-side only, 80% of them were the design templates for working with the Content Search web parts. The latter were used universally for almost each widget from a page with a dynamic list of blogs to a widget surfacing blog postings on select pages, to a content rotator with sliding images on the home page. After solving provisioning issues thanks to Chris O'Brien's blog post, we were able to speed up deployment cycles and significantly improve the development process. After all, it wouldn't be an exaggeration to state that the site was 100% search-driven.

We have also used traditional site search box and a search results page. For a relatively static web presence site we have opted to not use a search center, and instead use a custom search results page. We wanted to be able to search for blogs separately on the blogs listing page. There we have used content search web part, and a search box web part connected to it, and made the query to filter out anything but the blog postings. The global search results page would include a wider set of results, and allow user an option to switch to blog results only. I have earlier described how to configure this functionality. Because of our use of variations, each variation label had its own search results page, with a query scoped to only content residing in that variation label site.

With all the advantages of surfacing the content in content search web parts, I feel that the main drawback has to do with the obscurity of the content selection method, being the search query issued from a content search web part. This obscurity slows down development by making troubleshooting search queries or search results issues more difficult. The benefits clearly outweigh the complexities, and there are several  tools at our disposal: firstly the content search web parts themselves with the excellent ability to do query results previewing; secondly there is a Search Query Tool released by Microsoft; lastly there is a REST web interface which can be used also to troubleshoot the queries using a browser. Still we cannot fully get away from the  high complexity of the search sub-system, and from this point of view it is reminiscent to me of the past experience writing search-driven web parts in previous versions of SharePoint. If you compare complexity of troubleshooting your search queries with troubleshooting of equivalent SQL queries you will know what I mean, so just allow some extra time for getting it right before you start adding a lot of content and the time of full crawl would increase dramatically. The less time it takes to do a full crawl in your development environment the sooner you will get the search-driven functionality to a solid state.

One search-related issue we have encountered is worth mentioning: the pages we were authoring had Hide physical URLs from search fields set to False by default. Logically, we were getting duplicated sets of results in return to some of our search queries. Setting this field to True followed by re-indexing the content, only shows term-driven URLs in search results - duplication problem solved!

Master Page, Page Layouts and Branding

In general we have used a pretty standard approach that is in principal not much different from the one used in SharePoint 2010. The new Design Manager helps by letting us use HTML editor of choice, and not rely on the SharePoint Designer. The cost of this convenience was that we had to provision both *.html and *.aspx versions of our page layouts using features. When only the *.html versions are provisioned by a feature their conversion to *.aspx files isn't triggered automatically. Still we needed both of the kinds of files as during development it is important to have identically configured development environments that are ready for branding work. Exporting and re-importing a design package didn't quite work for us as we noticed this process changed columns on intrinsic content types we relied upon - a topic that deserves a blog post and a research on its own.

The site had responsive web design, which we have selected over the device channels option since it allowed us to stay with the same set of master pages and page layouts, and change only the style sheets with some minimal JavaScript support for the changes in navigation at smaller view port sizes. This "modern" style not only impacts the design of the master page and page layouts, but also affects the design of the content, in general making it more complicated. In our case we have provided sample "Lorem ipsum" content for all of the default pages. The sample content was intended to demonstrate to the information workers by a way of example how to structure the real content, and which CSS classes to apply to it.

The next topic, managed metadata navigation, is another key part of a WCM site architecture. It is described in the Part 2 of the series.

2 comments:

  1. Too good piece of information, I had come to know about your site from my friend, Bangalore, I have read atleast 11 posts of yours by now, and let me tell you, your web-page gives the best and the most interesting information. This is just the kind of information that I had been looking for, I'm already your rss reader now and I would regularly watch out for the new post, once again hats off to you! Thanks a lot once again, Regards, Best SharePoint 2013 Training Institutes in Hyderabad India

    ReplyDelete
  2. Brilliant piece of information, I had come to know about your web-page from my friend, Chennai, I have read atleast 9 posts of yours by now, and let me tell you, your webpage gives the best and the most interesting information. This is just the kind of information that I had been looking for, I'm already your rss reader now and I would regularly watch out for the new posts, once again hats off to you! Thanks a million once again, Regards, Best SharePoint 2013 Online Training in Hyderabad India

    ReplyDelete