Condividi tramite


SharePoint 2013 and SQL Server AlwaysOn - High Availability SharePoint

What is AlwaysOn and why use it for SharePoint?

Update: SQL 2014 + SharePoint 2013 (abridged) guide also available. Also if you want to use AlwaysOn for SharePoint Disaster Recovery, then you'll want to have a look at this article - https://blogs.msdn.com/b/sambetts/archive/2015/04/24/setting-up-sharepoint-disaster-recovery-sites-with-sql-alwayson.aspx

AlwaysOn is a great new feature to SQL Server 2012 that allows consuming clients to have a mirroring and failover database solution all transparently for a single named instance. SharePoint is one such consumer of SQL and depends on having a solid connectivity to run. If SQL goes down, so does SharePoint so it’s well within our interests to configure a bullet-proof SQL Server solution that just won’t die.

In this blog post we’ll set up an AlwaysOn cluster for a SharePoint 2013 farm in the following stages:

  1. Create SharePoint databases.
    1. Configure a SQL alias for a consistent SQL instance name.
    2. Create farm on SQL instance/replica 1, creating all databases needed (service-apps & web-apps).
    3. Stop SharePoint so databases remain static during migration to an AlwaysOn cluster.
  2. Create AlwaysOn high-availability group.
    1. Restore all the DBs onto SQL replica 2 (with NORECOVERY).
    2. Create AlwaysOn availability group
    3. Join the replica 2 databases to availability group.
    4. Create listener.
  3. Migrate SharePoint onto AlwaysOn on cluster
    1. Reconfigure SQL alias for new listener.
    2. Restart SharePoint services with updated alias.
  4. Test failover and that SharePoint can survive it.
    1. The failover process in action
    2. Why SharePoint doesn’t survive the 1st time.

Test Environment Details

For this demonstration I’ll be using a “super-cluster” configuration just because it’s the mother of all high-availability solutions. Each member for the AlwaysOn cluster will itself be a failover cluster with shared disks rather than a standalone machine. Why? That way we have instance redundancy and data redundancy too, all for the same logical SQL Server instance. Normally the two instances will be mirroring each others’ data but if one replica node goes offline the affected replica will live on no worries, and even if a whole replica (both nodes) does goes offline we can just failover to the other. The two replica members for this cluster will be:

  • SQL cluster 1 (primary replica) - SQL-SP2013\SP15
  • SQL cluster 2 (secondary replica) - SQL2-SP2013\SP15B

Both these instances are also on their own subnet so we’re covered for a whole bunch of doomsday scenarios – that being the whole point of this setup and blog-post. The parent cluster uses a file-share witness on a 3rd subnet to verify connectivity out of the subnet, or a routing failure.

To span both subnets we’re going to create a single logical SQL Server instance with IPs in both subnets 1 & 2, for SharePoint to connect to.

image

This is the SQL backend we’re aiming for in this example – it’s a cluster of clusters! Please note and to reiterate; for AlwaysOn to work this “cluster-of-clusters” setup isn’t required; you can setup an AlwaysOn cluster with just x2 standalone SQL Servers too, it’s just a single node failure would result in loss of mirroring whereas both replicas being failover clusters themselves allows us to survive up-to 3 simultaneous machine failures in our SQL Server cluster without breaking into a sweat! And that’s the goal here; a zombie SQL setup that just won’t die, for a SharePoint farm to lean on.

Create a SharePoint Farm on an AlwaysOn Cluster

Before we create a farm let’s connect to our two instances that’ll make up the replicas in the AlwaysOn cluster.

clip_image004

Both SQL clusters instances are ready and empty.

Create SharePoint Databases

Prepare SQL Alias

We need to fool SharePoint into thinking the SQL server where we'll create the configuration database is called something else. That’s because when you create a farm, whatever destination server you use during the “create new farm” stage isn’t forgotten (to simplify), so first thing’s first let’s create fake alias for our SQL “server” – “SP15” as it’s SharePoint 2013 that’ll use it (version 15). As far as SharePoint needs to know everything is going into a SQL box called 'SP15'.

Now assuming you don’t have SQL Server tools installed (which you shouldn’t if it’s on production!) you’ll need to use a tool called “cliconfg.exe” to create a SQL alias. Run it; click the “Alias” tab, and the Add button.

Add a name – SP15 for the alias name, but it can be anything - instead of referencing a server-name we’ll reference the name of the alias which will transparently redirect traffic to the alias destination instead.

clip_image005

Install & Configure Farm

The next thing we need to do is create all the databases we’ll need on the farm. For this example I’ll use a standard, home-baked script to create my basic services, accounts, and applications – and their corresponding databases:

  • Farm (1 database; SharePoint configuration)
    • Service Apps:
      • User-profiles (3 databases; profiles; social; sync)
      • Enterprise search (4 databases;
      • State service (1 database)
      • Usage & health service (1 database)
    • Content apps:
      • Default port-80 web-application (1 database)

We’re not so worried about having the perfect configuration at the moment so much as just getting the databases created.

clip_image007

The PowerShell script, doing the business. Notice the “database server” name is our faked alias and yet there’s no problem; aliases are very useful for this.

clip_image008

SQL instance 1 now with all the databases we’re now going to add to a High-Availability (HA), AlwaysOn group.

Stop SharePoint so databases remain static during migration

We now have everything we need to create the single SQL point of access. First thing’s first though, let’s stop any updates happening to any SharePoint database happening until we're all setup by shutting the SharePoint services down – IIS, SharePoint timer are the important ones. We’ll restart them once we’ve updated the alias to use the AlwaysOn group and SharePoint will be none the wiser.

And we’re done with SharePoint for now.

Create SQL Server AlwaysOn Group

Restore all the DBs onto SQL instance 2 (with NORECOVERY)

Next thing we need to do is copy the databases to the 2nd instance. Part of the AlwaysOn wizard can do this automatically for you if both instances have exactly the same database file locations however we can't do this as each failover replica has its own shared cluster-disk for its’ data files, and each disk has its’ own drive letter, therefore the paths will never match-up. And that’s a shame, but we have to do this the hard way.

To join the 2nd replica we need to run a full backup of all the databases on replace 1 but for now we’ll start with just the Central Administration content DB.

clip_image010

Take a full backup. Copy to destination server. Restore.

clip_image012

No need to create a new empty database. Just directly restore & SQL will create the necessary bits & pieces for you.

clip_image014

Important: the restore needs to be done with the NORECOVERY option. If this isn’t done, AlwaysOn can’t use the database.

clip_image015

Once successful, the database will stay in the “restoring” state. Don’t worry about this – it’s just SQL is basically saying the DB isn’t usable from this instance, for now.

Create AlwaysOn availability group on one database

Next, we’re going to create our AlwaysOn instance. We’re going to only add the Central Admin content DB to the group for now…it’ll be clear later why.

On the primary server (SQL-SP2013 in this example), expand “AlwaysOn High Availability” (this requires you to enable AlwaysOn in the SQL Server Configuration Manager) and click “New Availability Group Wizard…”

clip_image017

Add name. This will be the name of the role added to the parent cluster. Click “Next”.

clip_image019

This is where we specify which databases to add to the HA group. We’ve only backed up one the CA content DB so this is only one we can add for now.

Also this screen is a good place to point out something this technique requires that SharePoint doesn’t by default configure – a full recovery model for databases. We’ll change this later, but for now we just want to create the HA group. Click Next.

clip_image021

Here’s where we specify which SQL instances will make up our AlwaysOn HA group. Click “Add Replica” and connect to the 2nd instance. We’ll create a listener later; just accept the defaults for now.

Important notes here about the AlwaysOn configuration:

  • The default is “asynchronous commit” for data-safety settings which basically means the primary instance doesn’t wait for the secondary instance to confirm its’ COMMIT until the primary does its’ own COMMIT for any updates to the AlwaysOn databases. This isn’t necessarily unsafe but there’s a small potential for data loss should the network breakdown between instances during an update of some kind, however performance is much better. We care about safety over speed in this example, so we’re going to “synchronous commit”.
  • You can have the secondary instance as a read-only instance but this slows down the machine so it’s off by default. Obviously if there’s a failover the secondary instance will start accepting sessions again from SharePoint but enabling this is probably not necessary right now.

clip_image023

Here’s where we kick-start the synchronisation. Again, SQL Server can save a lot of pain here & do everything for you (saving you the whole backup/restore process above) but as I mentioned, my SQL instances have different drive-letters for their data-stores so a full synchronisation will never work. We’ll go with “Join”.

clip_image025

After a while, this should be what you see. If you get any error it’s likely because the backup/restore wasn’t done correctly – check the backup was a full backup?

Also it’s worth mentioning there’s some clustering considerations AlwaysOn requires like making sure each replica failover instance can’t be cross-hosted; I’m not sure why this is, but you’ll need to make sure if you have a cluster-of-clusters as I do here each cluster member can only host one failover cluster or the other, but not both.

Assuming it worked you should you be able to right-click on the availability group, select “Dashboard” and see this wonderful screen of green:

clip_image027

A couple of points here:

  • We’ve configured the group for synchronous commits for both nodes. Again, this has a performance hit as a round-trip “ok” is needed before any writes are completed, but it is the safest. This makes little difference for SharePoint sites that are read-heavy compared to writes; a publishing site say, but could drag if lots of updates occur instead, say in a collaboration site. You can run in asynchronous mode which doesn’t wait for the ACK from the secondary node before continuing, but that’s at the risk of losing data.
  • For the same reason the “failover readiness” says “no data loss”. If we were running async commits this would say “data loss” for the secondary databases but this is just over-dramatizing the fact replica 1 doesn’t wait for replica 2 to commit before committing itself, so it can’t know the failover commit occurred and therefore it assumes the worst. If you want absolute guarantees and don’t mind the performance hit, make the AlwaysOn group perform synchronous commits.
  • Green is good; if one day it turns not-green (i.e. you have an unexpected failover or communication failure) then you need to figure that out with your DBA/networking people – don’t blame me, I’m just a SharePoint guy :)

Restore remaining databases to SQL machine 2

Next up; we need to set every database that isn’t already configured to full recovery that mode. I’d point out that there are possible supportability issues here in that the product-group haven’t officially said this is a supported action or not – I say this as a disclaimer on the very slim off-chance there are issues with changing the recovery model on SharePoint databases. There shouldn’t be any issues though although performance is obviously faster if simple transaction-logging is used.

It’s fairly simple to change the recovery model – right-click on a database, open “options”, select the recovery model there and press OK. That’s it – no downtime or restarts or anything needed.

clip_image029

Do this for any database not in full-recovery mode. That’ll be:

  • Any search database.
  • Any user-profile app database.

Next, backup each database as before. Copy as before.

clip_image031

Restore as before (WITH NORECOVERY). Now your SQL Server Management Studio should look something like this:

clip_image033

Scripting restores can be quicker than managing the GUI for multiple DBs at once (if they’re not overly complicated in file-structure that is). All databases are now ready to add to the high-availability group!

Add databases to availability group

On the primary instance, expand the AlwaysOn groups; right click on ours and select “Add Database”. We should now see the following:

clip_image035

Select them all & click next. Again, we’re joining only as we’ve restored the exact copy to the 2nd SQL replica. The process is basically as before just with a “connect” section to authorise.

clip_image037

This is hopefully what you’ll see. If some database(s) don’t join because of insufficient transaction-log data with this error:

The mirror database, "[database]", has insufficient transaction log data to preserve the log backup chain of the principal database. This may happen if a log backup from the principal database has not been taken or has not been restored on the mirror database. (Microsoft SQL Server, Error: 1478)

If you see that persistently despite being sure of a full backup & restore, try again but just backing-up and restoring the transaction log to the same database(s) this time. It worked for me anyway.

Create SQL AlwaysOn Listener

We’re almost done with SQL; our SharePoint databases are mounted and are perfectly syncing one to the other, and in fact there’s nothing stopping us from directly using the old connection-string again knowing that our changes will be mirrored to the 2nd SQL instance. However we want a more automatic failover process than that – we want an abstract instance name that’s independent of which replica is the active one, and that’s what SQL AlwaysOn listeners give us.

Right-click on the HA group “SP15-SQL” again and click “Add Listener”. This is more of a network-admin type thing but the idea is to create IPs in both subnets. We give it a name which will be the equivalent of a generic clustered end-point and a port-number to listen on. Behold:

clip_image039

Click OK. This has now happened:

clip_image041

The listener has a DNS record, computer account in AD, and the assigned IPs. Test it by pinging it. Clustering is awesome!

Add SharePoint Servers to the farm

Configure SQL alias for listener

So that’s about it for the SQL side. We now need to add our SharePoint servers to use this new “server” we’ve setup, but given it’s on a non-standard port we need to use SQL aliases to get there with a simple name. How handy we have one as we needed to fool SharePoint into thinking the SQL Server name so we’re in luck – we’ll just update the alias with the new details!

Open “cliconfg.exe” again and edit the alias. Enter the listener name, select TCP/IP, unselect “dynamically determine port” and enter the port-number we assigned for the listener.

clip_image042

Accept all changes. Repeat this on all SharePoint servers.

Restart SharePoint services (with updated SQL alias)

The final stage in this journey now we’ve moved everything and have updated the alias is to restart all SharePoint services on our starting server and finally add any further servers to the same configuration database necessary. Start IIS & SPTimer and SharePoint will have no idea it’s now looking at a difference SQL instance.

Connection success (again)! Continue the wizard as normal & repeat, lather and rinse for any remaining SharePoint servers.

clip_image044

All SharePoint cares about is that its’ databases are on a SQL Server called “SP15”. Make sure this alias is consistent for all your SharePoint servers.

Test the Failover – Will SharePoint Survive?

So now’s the important bit – the main reason for doing all this is to ensure the continued uptime of SharePoint should the SQL back-end drop unexpectedly (even failover clusters go down sometimes). Suffice to say this blog is long enough as it is; I’ll just show what happens on a manual failover and you’ll have to believe that it kicks in automatically when various nodes go offline or don’t (assuming configured correctly).

The keyword here is “test” because as we’ll see, SQL will failover perfectly to the 2nd replica but SharePoint won’t survive for reasons we’ll see in just a moment…

So once we’re convinced we have both replicas talking and syncing to each other perfectly, we’re going to failover to the 2nd replica (SQL2-SP2013\SP15B). Right-click on the HA group again and select “Failover…”

clip_image046

This confirms what we’re about to do is “safe” – we’re running in synchronous mode so all update transactions are always guaranteed safe. Click next, then connect to the 2nd replica with an appropriate login and start the failover.

clip_image048

Success! If you have the dashboard of the AlwaysOn group of the 1st replica, it might show all sorts of critical errors but it’ll straighten itself out once it works out the primary is now the secondary. Close it and open the dashboard on the previously secondary replica and it should show all OK.

clip_image050

Everything is now being run by our 2nd replica. Ping SP15 and you’ll see the DNS has updated too.

But SharePoint is Down!

Agghh! The farm has been floored! Nightmare!

Yes indeed. This is also why planned, regular failovers are a good idea because you can never be certain of what’ll happen on a failover until it happens. You’ll notice SharePoint may now be complaining that various key accounts are being denied login by SQL Server. Well that simple why; your farm accounts probably aren’t known by the 2nd replica endpoint.

Solution: add the accounts to SQL as per this article (https://technet.microsoft.com/en-us/library/cc678863.aspx) and check everything is working as with both replicas being used. Make sure to check:

  • All web-applications (default web-app; my-sites)
  • All service applications (Search; UPA; Excel Services, etc.)

If something isn’t working with one replica active but on another one it is then there’ll be a difference in permissions. Testing is vital here; make sure you run through every service & app that’ll use different credentials and ensure each one works with each replica member in the cluster being the primary.

Important

Don’t change anything in the failover cluster manager! I’m told by my SQL friends this will render the cluster in an unsupported state as there’s all sorts of meta-data that Management Studio needs & reads which the Cluster Manager does not.

Wrap-Up

This has hopefully shown the benefits in mounting SharePoint in an AlwaysOn cluster and how that can work. Later we’ll experiment with some of the options AlwaysOn gives us and the farm configuration in this example is far from ideal, but the key stage explained here is how to mount the SharePoint databases at least.

SharePoint is increasingly becoming critical to have online; this is yet another way we can make sure it stays online!

 

Cheers,

Sam Betts

Comments

  • Anonymous
    June 26, 2013
    What happens when I create a new webapplication in SharePoint. The alias points to the availability group. So I can't create a database. Just add existing.

  • Anonymous
    June 26, 2013
    You can add new databases but it'll only create it on the active node so if you have a failover, whatever depended on that DB will fail.

  • Anonymous
    July 21, 2013
    Great articles, thanks

  • Anonymous
    September 11, 2013
    Good post- the "But SharePoint is Down" part was very helpful. Many Thanks!

  • Anonymous
    October 16, 2013
    How would your configuration differ if you were in a multi-site DR configuration where only contentdb's are being replicated via Always On?

  • Anonymous
    January 02, 2014
    The comment has been removed

  • Anonymous
    January 02, 2014
    And for that do I need to configure cluster on both boxes?

  • Anonymous
    February 05, 2014
    The comment has been removed

  • Anonymous
    February 05, 2014
    The comment has been removed

  • Anonymous
    March 06, 2014
    The comment has been removed

  • Anonymous
    March 06, 2014
    Nice! Thanks for contributing :)

  • Anonymous
    April 08, 2014
    What about a contained database user ?

  • Anonymous
    April 08, 2014
    Contained databases aren't supported by SharePoint in any way, to the best of my knowledge.

  • Anonymous
    July 09, 2014
    I got a question. Can i create a extra instance on the same server which contains the FailOver? and what happens when i shift the data local to another location?

  • Anonymous
    July 09, 2014
    The comment has been removed

  • Anonymous
    July 09, 2014
    thanks for the answer. I mean a standalone instance with no releationship to the AO

  • Anonymous
    July 09, 2014
    Ah, well you can setup mirroring for most SharePoint DBs I believe and then configure SP to use the 2nd as a failover, but not for everything. That's why AO is a nicer solution - the mirroring & failover is transparent to SP so it all keeps on working even after a failover.

  • Anonymous
    August 21, 2014
    few questons.

  1. sp15 and sp15b are they each a traditional cluster; I mean sharing same storage?
  2. so for sql we shall need 2 windows server+2 windows server and then another 2 for listener. Is that right.
  • Anonymous
    August 21, 2014
    Hi Sardar, Yep, each AlwaysOn member instance in this example are failover clusters with shared storage. You can make an AlwaysOn cluster with just standalone instances no problem though, so just x2 standalone SQL machines in the same Windows cluster would be fine.

  • Anonymous
    September 15, 2014
    Can we use AAG to failover one web application when app updates are made?

  • Anonymous
    September 16, 2014
    Hi Sana; I'm not sure I understand the question. AAG?

  • Anonymous
    September 16, 2014
    First of all GREAT articles Sam! I was also wondering is the contained databases supported and according to this technet article it seems to be... maybe :) :) technet.microsoft.com/.../jj715261(v=office.15).aspx Kirk Evans also posted a Access Service configuration mentioning contained databases / logins blogs.msdn.com/.../configuring-access-services-2013-on-premises.aspx

  • Anonymous
    October 27, 2014
    Helped me to solve my issue!

  • Anonymous
    November 04, 2014
    For the Health and Usage database, is it going to impact performance to put it into the HA cluster? I believe it wasn't recommended to one of the published recommendations from Microsoft to put that database in a HA cluster. Thanks.

  • Anonymous
    May 15, 2015
    Hi Sam, This is a great article and I would like to add AlwaysOn to my existing SharePoint 2013 farm.  One issue I have is that my SQL Server 2012 is Standard and not Enterprise.  Would you know if I can just change the edition to Enterprise using the SQL install media safely?   Thanks! John

  • Anonymous
    May 21, 2015
    Hi John, This does require Enterprise SQL as far as I know. As for how to upgrade, this is apparently the way to do it - msdn.microsoft.com/.../cc707783.aspx // Sam

  • Anonymous
    June 12, 2015
    Just to be clear, the secondary node that has become primary by virtue of failover won't be able to receive updates, etc as there is no configuration to enable this.  

  • Anonymous
    June 16, 2015
    Hi Sam Great article, but I'm afraid the configuration is throwing me. From your initial diagram it gives the impression that you have two Clusters on two subnets - which is what I have, but then I'm reading elsewhere that the nodes must all be in the same cluster? So if I have 4 nodes (2 on each subnet) - do I add the 3rd and 4th servers as nodes to the cluster running on node 1 and 2? And then install SQL Server on to 3 and 4 by choosing "new failover install" or "add node to existing"? Surely then that wouldn't give me to instance names would it? Duncan

  • Anonymous
    June 24, 2015
    Hey Duncan, The setup I had (back then) was 1 Windows Server cluster over 3 subnets, with 2 subnets having their own SQL Server failover instances (2 roles in the cluster) and then AlwaysOn using both logical failover instances to form a 3rd role in the cluster for the AO "cluster". All in the same Windows cluster though. I hope that helps! // Sam

  • Anonymous
    June 24, 2015
    Hey John L, I'm not sure what you mean; with synchronous failover the secondary/new-primary node can update data no problem... // Sam

  • Anonymous
    September 14, 2015
    I have about 25 huge Content Databases. Can someone post the script to restore all the SharePoint Databases (including Content DBs and Service App DBs and SP Config DB) to the new SQL instance with NORECOVERY option(Y/N)? Thanks, AR.

  • Anonymous
    September 16, 2015
    The comment has been removed

  • Anonymous
    November 11, 2015
    The comment has been removed

  • Anonymous
    January 03, 2016
    Hello Sam, I have a question regarding "stage 3"; adding SharePoint servers to the farm; do you mean we should have setup SharePoint on WFE2 and APP2 already?? And then connect with an alias to SQL?? rgds, Khyran Herklots

  • Anonymous
    January 13, 2016
    Hi Khyran, Basically the SPFarm needs to be created (on the 1st server) under the name of the alias. Once that's done, create the same alias on the other servers & add to the farm. // Sam

  • Anonymous
    January 13, 2016
    Hi John, The alias is created on the SharePoint side, on each server. That's just so we can move the alias destination to the listener later for convenience & to create all our SP databases with by pointing directly to a single SQL instance at first. Remember, SQL aliases are resolved client-side at the time of connection. As for question 2; listeners use Windows clustering to create Cluster Name Object for the listener name & redirection. // Sam

  • Anonymous
    March 22, 2016
    Sam- Great post! I have a question. What steps would I take if I already have a live farm created using my primary SQL1 server name instead of an alias? My SQL cluster/Always On groups are setup and working fine. Could I configure the second farm to use my SQL cluster name or SQL server2 name? Thanks for the help!

    • Anonymous
      March 23, 2016
      Sam- I tried adding my second SP server to the farm last night using the cluster name to see if that would work. During the configuration setup it asks if I wanted to host the site from this server or not. Which selection is best to have a full DR server without disrupting the primary server in the farm? Can Central Admin be accessed by both SP servers at the same time, knowing the second will be read only. Also, when reviewing my content database settings I noticed the database server is set to the name of my SQL1 server name. Will this make a difference?
      • Anonymous
        March 23, 2016
        Hey David,You shouldn't need Central Admin "highly available" but you technically could - it's not really any different from any normal app. Just make sure you have the AAMs setup ok. Client-side aliases should sort any name redirection out, even if it's for a real server-name. It's only used by the SQL client.// Sam
    • Anonymous
      March 23, 2016
      Hey David,You can create an alias with the same name as your SQL Server & point it somewhere else; it's resolved the same way.Cheers,Sam
  • Anonymous
    March 23, 2016
    Sam- First of all I hit the wrong reply when adding this earlier....sorry about that.I tried adding my second SP server to the farm last night using the cluster name to see if that would work. During the configuration setup it asks if I wanted to host the site from this server or not. Which selection is best to have a full DR server without disrupting the primary server in the farm? Can Central Admin be accessed by both SP servers at the same time, knowing the second will be read only.Also, when reviewing my content database settings I noticed the database server is set to the name of my SQL1 server name. Will this make a difference?

  • Anonymous
    July 05, 2016
    Thank you for this series of posts. Can I assume that there is no way to use this particular setup as HA during SharePoint patching? I figured this is a no-go as one farm will have db versions that are out-of-sync with the other farm?

  • Anonymous
    July 28, 2016
    What happens if we installed sharepoint and created the farm to a specific SQL server, and then added the DBs to a HAG, and created the alias? Does the alias ABSOLUTELY need to exist first?For example pretend we have two SQL servers SQL12-1 and SQL12-3.Our listener is named "SQL12Listener"If we installed sharepoint directly to SQL12-1, (without an alias built) and then added the DBs to a HAG which is on SQL12-1 and SQL12-3, what should our alias look like now?

  • Anonymous
    February 20, 2017
    The comment has been removed