How does big players : Azure, Google…

When looking at under the hood architecture for Cloud leaders, we immediately see 2 common things:

  • Few disclosed figures make you feel dizzy (yes, 1 Peta = 1000 Tera…)
  • Very few technical infos disclosed

What happen is that Google & Azure recently let some infos going out for their network :)
Facebook is more ahead and provide more information.

Microsoft Azure

About Azure, here are slides from Russinovich master (click on image to get PowerPoint (29 slides)):

2015-06-19_14-37-27

Most important news: they use FPGA (programmable processors) to manage network layer (40GbE/s) (or to mine bitcoins, who knows!).

Last time I had valuable informations was at TechEd in 2012, again from Mark. Video is still online:

  • ~10 people to admin around 100 000 servers!
  • Demo of one of the Azure admin interface,
  • Graphical view of racks with VM,
  • Demo on platform self healing,
  • Explain on leap day bug (29th of february), with even source code line that broke everything

Google

Looks like they index evertything excep their own architecture content (with a killer robots.txt ;)
Techcrunch collected interesting infos, again on their network management (SDN):

http://techcrunch.com/2015/06/17/google-pulls-back-curtain-on-its-data-center-networking-setup/

In 2009, They had published a visite in their datacenter, with a server content inside a container:

2015-06-19_14-56-27

Facebook

They are more chatty, may be frm their DNA…:

https://code.facebook.com/posts/360346274145943/introducing-data-center-fabric-the-next-generation-facebook-data-center-network

Figure 2:

A Facebook server (old model I guess):

Empty at the back is true:

Office365

Most information are about Exchange:

  • Backup less,
  • 3 replicate + 1 Lag at 8 days (goes down to 0 in case of issues)
  • JBOD storage (1 database per hard drive).
  • Optimizations through Exchange 2016

Office 365 is independent from Azure, while some services are using it and merge is upcoming I guess.

Amazon

  • They are using Xen for virtualization,
  • They also make their own server,
  • Based on Open source

Conclusions

They have hyper needs, but also hyper resources to handle them:

  • Complete control of the entire chain: datacenter, network, servers, OS, hypervisor, applications, load balancers, SQL engines
  • Developments in low layers : SDN, FPGA…
  • Source code (and people to handle it): Windows for Microsoft, Linux for Google, Facebook and Amazon
  • They use Open source (OpenFlow, memcache, Hadoop…) but extend them,
  • If they think so, they can spend huge money on topics (like Google with its SDN).

All this gives them a strong independence from other enterprises and their potential buyouts. Other topics may cost less from quantity point of view (ie: earning $50 per server, x300 000)

Where we generally happily stop (geo cluster, DRP, LB) it’s minimum to provide for them. Once this done, another road open with load.

 

news

As you may have noticed, i just changed theme’s site. After years of royalty, Mystic theme didn’t survive the latest wordpress upgrade (memory fault)… I hope you will enjoy this theme as much as previous one!

I also migrated architecture for testings. Unoticed, but now this blog involve 8 VM:

  • 2 Load balancer under Zen load balancer
  • 2 Front Web (Apache/PHP…)
  • 2 GlusterFS nodes (distributed filesystem) (host WordPress’s site data)
  • 2 cluster MariaDB Galera nodes (WordPress database)

All hosted in a single VMware node, as it’s about testing architecture :)

SharePoint – DateTimeControl – unexpected error – iframe – Object reference not set to an instance of an object

Issue

Inside an existing solution, I use a DateTime control:

<Sharepoint:DateTimeControl ID="DateTimeControl1"  dateonly="true" ShowWeekNumber="true" FirstDayOfWeek="1" FirstWeekOfYear="2" runat="server" EnableViewState="true"></sharepoint:DateTimeControl>

Instead of the date picker, i had the following error (An unexpected error has occurred):

2014-12-10_09-45-35

 

In ULS logs:

Medium Application error when access /_layouts/15/iframe.aspx, Error=Object reference not set to an instance of an object. at Microsoft.SharePoint.Utilities.SPUtility.GetThemedImageUrl(String originalUrl, String themeKey) at Microsoft.SharePoint.WebControls.DatePicker..ctor() at Microsoft.SharePoint.WebControls.SPDatePickerControl.InitDatePicker() at Microsoft.SharePoint.WebControls.SPDatePickerControl.set_ShowWeekNumber(Boolean value) at Microsoft.SharePoint.ApplicationPages.DatePickerFrame.Page_Load(Object sender, EventArgs e) at System.EventHandler.Invoke(Object sender, EventArgs e) at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint)
Unexpected System.NullReferenceException: Object reference not set to an instance of an object. at Microsoft.SharePoint.Utilities.SPUtility.GetThemedImageUrl(String originalUrl, String themeKey) at Microsoft.SharePoint.WebControls.DatePicker..ctor() at Microsoft.SharePoint.WebControls.SPDatePickerControl.InitDatePicker() at Microsoft.SharePoint.WebControls.SPDatePickerControl.set_ShowWeekNumber(Boolean value) at Microsoft.SharePoint.ApplicationPages.DatePickerFrame.Page_Load(Object sender, EventArgs e) at System.EventHandler.Invoke(Object sender, EventArgs e) at System.Web.UI.Control.LoadRecursive() at System.Web.UI.Page.ProcessRequestMain(Boolean includeStagesBeforeAsyncPoint, Boolean includeStagesAfterAsyncPoint)

 

Solution

You need to create a root site collection (/).

Easy, but SharePoint is capricious as always.

 

Project Server 2013 – Access to the registry key ‘Global’ is denied

Issue

Every minutes, in ULS logs, you get this error:

  • Process: Microsoft.Office.Project.Server
  • Product: Project Server
  • Category: Project Calculation Service
Timer Task thread crashed System.UnauthorizedAccessException: Access to the registry key 'Global' is denied. 
 at Microsoft.Win32.RegistryKey.Win32Error(Int32 errorCode, String str) 
 at Microsoft.Win32.RegistryKey.InternalGetValue(String name, Object defaultValue, Boolean doNotExpand, Boolean checkSecurity) 
 at Microsoft.Win32.RegistryKey.GetValue(String name) 
 at System.Diagnostics.PerformanceMonitor.GetData(String item) 
 at System.Diagnostics.PerformanceCounterLib.GetPerformanceData(String item) 
 at System.Diagnostics.PerformanceCounterLib.get_CategoryTable() 
 at System.Diagnostics.PerformanceCounterLib.CounterExists(String category, String counter, Boolean& categoryExists) 
 at System.Diagnostics.PerformanceCounterLib.CounterExists(String machine, String category, String counter) 
 at System.Diagnostics.PerformanceCounter.InitializeImpl() 
 at System.Diagnostics.PerformanceCounter..ctor(String categoryName, String counterName, String instanceName, Boolean readOnly) 
 at System.Diagnostics.PerformanceCounter..ctor(String categoryName, String counterName, String instanceName) 
 at Microsoft.Office.Project.Server.BusinessLayer.PcsEngine.PcsPerfCounter.<.ctor>b__0() 
 at System.Lazy`1.CreateValue() --- End of stack trace from previous location where exception was thrown --- 
 at System.Lazy`1.get_Value() 
 at Microsoft.Office.Project.Server.BusinessLayer.PcsEngine.PcsPerfCounter.EnsureSampleDataTask() 
 at Microsoft.Office.Project.Server.BusinessLayer.PcsEngine.PcsTaskWorker.PerformTasksCallback(Object obj) StackTrace: 
 at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx) 
 at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx) 
 at System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem() 
 at System.Threading.ThreadPoolWorkQueue.Dispatch()

Solution

Project Server maintains performance counters for itself. So it must have rights to do so. You need to add the account that start Project Server to the following local Windows groups:

  • Performance Log USers
  • Performance Monitor Users

Then restart Project Server services when the job queue is empty

SharePoint 2013 – NodeRunnerQuery and NodeRunnerAdmin – SQL access errors

Issue

In ULS Logs, you get these errors in loop:

  • Unable to get sql session to admin database
  • NerioCluster: No lease returned when reading DB

2014-10-05_10-58-11

Solution

Search Host controller Service is started but no search application exist in farm.

You need either to create a search application or stop this service from Central administration:

2014-10-05_11-01-38

SharePoint 2013 – MsiInstaller – 1015 – Failed to connect to server. Error: 0x80070005

Issue

Everynight in event log, you find this warning:

Failed to connect to server. Error: 0x80070005

2014-08-03_16-37-25

But you don’t find the generally associated COM errors.

Solution

  1. Make farm admin account temporary local admin (the one that start SharePoint Timer)
  2. Through SharePoint PowerShell: Get-SPProduct -local
  3. Through SharePoint PowerShell: (get-spserver $env:computername).NeedsUpgrade
  4. Start manually the timer job which generate this warning: Get-SPTimerJob job-admin-product-version | Start-SPTimerJob
  5. Ensure this warning is not happening anymore
  6. Remove from local admin group
  7. Restart SharePoint Timer
  8. Start again manually the timer job which generate this warning: Get-SPTimerJob job-admin-product-version | Start-SPTimerJob
  9. Ensure this warning is not happening anymore

Windows – performance counters – unable to add these counters

Issue

One of my Windows 2012 server stopped reporting network performance counter.Soudainement. Even perfmon itself complains upon starting:

2014-08-20_21-49-38

 

Solution

Starting lodctr /R doesn’t help. Counter is here but disabled. Checked starting lodctr.exe /Q:

[Tcpip] Performance Counters (Disabled)
 DLL Name: %SystemRoot%\System32\Perfctrs.dll
 Open Procedure: OpenTcpIpPerformanceData
 Collect Procedure: CollectTcpIpPerformanceData
 Close Procedure: CloseTcpIpPerformanceData

 

So we have to enable it again:

lodctr.exe /E:Tcpip

 

The story doesn’t say why it happened in first place…

 

 

Microsoft Project – deliverable – fichiers srchui.dll and jscript.dll

From Microsoft Project 2010 and 2013, using the deliverable you get this error message:

23-03-2014 11-49-30

This message look like coming from graveyard of Project 2003.

Useless to search your hard drive for these files, you don’t have them.

The real problem is security of the “Internet” zone in Internet Explorer, even if your server is in the “intranet” one.

Internet zone must not be in the highest position, or deliverables won’t work.

26-03-2014 07-43-13

Even happening with Windows Server 2012 R2 + Project 2013 SP1 + IE 11 full updated…

 

 

SharePoint/Project 2013 SP1 – Configuration wizard

After deploying Service pack 1 for SharePoint/Project 2013, you have to go through the configuration wizard. In this case, it was failing on all nodes with this error in log:

SyncUpgradeTimerJob: sleeping for 10 seconds
SyncUpgradeTimerJob: sleeping for 10 seconds
SyncUpgradeTimerJob: Upgrade timer job failed. Return -1.
The exclusive inplace upgrader timer job failed.

 

Step 1 – other log

Another log is much more useful, in same location, named like upgrade-date-blahblah-error.log:

Exception: The operation cannot be performed on database "WSS_UsageApplication" because it is involved in a database mirroring session or an availability group. Some operations are not allowed on a database that is participating in a database mirroring session or in an availability group.  ALTER DATABASE statement failed

Or by using upgrade-spfarm command:

29-03-2014 12-55-19

 

Answer

Remove SQL Mirrpor or Always-on during upgrade on this database.

 

 

RDP 2012(R2) – session collection – profile disk – 800391163

While creating a session collection on Windows 2012/2012R2, and specifying a share to host profile disks, you get error 800391163:

rdp2012_session_profile_disk_800391163

While applying, one of the RDP will create VHD template on the share using Local System:

rdp2012_session_profile_disk_procmon

In details:

rdp2012_session_profile_disk_procmon2

Even if the computers objects have full control on NTFS, it still fails.

You have to give “full control” to “everyone” on the share, “change” is not enough:

rdp2012_session_profile_disk_share

Another closed case ^^