다음을 통해 공유


Troubleshoot Deployments of Windows Azure Nodes with Microsoft HPC Pack

Applies To: Microsoft HPC Pack 2008 R2, Microsoft HPC Pack 2012, Microsoft HPC Pack 2012 R2, Windows HPC Server 2008 R2

This topic contains information to help you or Microsoft Support troubleshoot deployments of Windows Azure nodes with HPC Pack.

For general requirements and best practices to deploy Windows Azure nodes with HPC Pack, see the following:

In this topic:

  • General deployment troubleshooting guidance

  • Trace log files on Windows Azure nodes

  • Scenarios to store trace log data

    • Scenario 1: Enable automatic transfer of trace log files to Windows Azure blob storage

    • Scenario 2: Enable automatic transfer of trace log data to Windows Azure table storage

    • Scenario 3: Manually retrieve and store log files from Windows Azure nodes

  • View log data in a Windows Azure storage account

General deployment troubleshooting guidance

  • If there is a problem with your Internet connection or with the Windows Azure subscription information provided in the node template, the Windows Azure node deployment can fail. You can validate the connection settings for Windows Azure in the node template. Open the template in Node Template Editor. Then, on the Connection Information tab, click Validate connection information.

  • If there is a problem with the configuration of the Windows Azure management certificate, see Troubleshoot certificate configuration problems.

  • If you are running at least HPC Pack 2008 R2 with SP2, you can run the Windows Azure Firewall Ports diagnostic test and the Windows Azure Services Connection diagnostic test to help verify that the network firewall and other settings are properly configured for communication between HPC Pack and Windows Azure or to troubleshoot connectivity problems.

  • If the system time is not set accurately on the head node computer (or head node computers), certain Windows Azure operations such as node template creation or deploying new nodes can fail with an error similar to the following:

    Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.
    
  • If you experience partial failures in deployment, where the Windows Azure nodes do not come online, you can try running the following telnet command to see whether the cloud service specified in the node template is reachable at the Windows Azure endpoint:

    telnet <ServiceName>.cloudapp.net 7999
    

    Note

    To run this command, the Telnet Client feature must be installed in the operating system. For information about how to install Telnet Client by using Server Manager, see the Telnet Operations Guide.

  • A problem in Windows Azure can affect a subset of the Windows Azure nodes that are in a set. For example, if you are starting a large number of nodes, it is possible for the deployment to fail on one or more nodes. In this case, you will see appropriate status information for the failed nodes in Node Management.

  • Deployment status information appears in the service account information in the Windows Azure Management Portal. HPC Cluster Manager regularly queries this portal for updated status information. However, the information in the portal can differ from that in the provisioning logs or the operations log in HPC Cluster Manager.

  • If a deployment error occurs in Windows Azure, an error message and troubleshooting information may appear in the cloud service information in the Windows Azure Management Portal, or in the provisioning log in HPC Cluster Manager. If you cannot resolve the problem, you can review the trace logs that are generated on the role instances in the deployment. For more information, see Trace log files on Windows Azure nodes in this topic.

    You can also visit Windows Azure Support. To assist in troubleshooting the problem, be prepared to provide the subscription ID that is configured in the node template and the deployment ID that appears in the provisioning log in HPC Cluster Manager and in the portal.

  • After you have provisioned a set of nodes in Windows Azure, you can start an additional set of nodes using the same node template. However, in some cases, the additional nodes fail to come online in HPC Cluster Manager, but they appear to be deployed successfully in Windows Azure. If this occurs, it may not be possible to use HPC Cluster Manager to stop or delete the failed nodes. If necessary, first stop and restart the HPC Management Service. Then, to delete the nodes, use the Windows Azure Management Portal.

  • Starting with HPC Pack 2012 with SP1, to help troubleshoot Windows Azure node deployments, you can opt to collect on the head node and send to Microsoft data about the availability, connectivity, and performance of Windows Azure nodes. You might choose to do this if you need to open a support incident related to a Windows Azure node deployment. To enable data collection, in HPC Cluster Manager, on the Options menu, click Windows Azure Support Data Collection. Alternatively, configure the AzureMetricsCollectionEnabled cluster property by using the Set-HpcClusterProperty HPC PowerShell cmdlet. For more information about the data collection, see the Microsoft HPC Pack Privacy Statement.

Trace log files on Windows Azure nodes

Starting with HPC Pack 2008 R2 with SP4, trace log files are generated automatically on Windows Azure worker nodes, and on the Windows Azure HPC proxy nodes that are automatically provisioned for each deployment. The log files can help you or Microsoft Support troubleshoot issues during or after node provisioning – for example, conditions that can cause a Windows Azure node to show a health state of Unreachable or Error, even though the Windows Azure Management Portal might indicate a status of Ready.

The trace log files contain the following types of information about each node:

  • Bootstrapping information for the operating system.

  • Information about the HPC Pack services that should run on the node.

  • Information about the Hosts file.

  • Operating system performance counter data.

The log files are written to local storage on each node, as shown in the following table. The formats, characteristics, and naming of the trace log files depend on the version of HPC Pack.

Important

The log files are only maintained in local storage on the Windows Azure role instances while the nodes remain provisioned in Windows Azure. Unless the files or data are copied to another location, you will not be able to review the trace log information after the Windows Azure nodes are stopped or deleted. For more information, see Scenarios to store trace log data in this topic.

Version of HPC Pack Log files Notes

HPC Pack 2012

  • Worker nodes   C:\logs\hpcworker_nnnnnn.bin

  • Proxy nodes   C:\logs\hpcproxy_nnnnnn.bin

  • Log files are in binary format. The default logging level is Verbose.

  • Each log file is a maximum of 4 MB in size by default, and a maximum of 5000 MB of log files can be stored on each node.

  • To facilitate analysis, log files can be converted to tab-separated text files by running the parselog subcommand of the hpctrace command--line tool that is installed with HPC Pack. For more information about using this command, see hpctrace.

HPC Pack 2008 R2 with SP4

  • Worker nodes   C:\logs\hpcworker.log

  • Proxy nodes   C:\logs\hpcproxy.log

  • Log files are in text format. The default logging level is Warning or higher.

  • Up to five numbered overflow log files with extension .00<Integer> are written on each node.

  • Log files on each node are limited to 60 MB and then cycled automatically.

Note

You can use the Configure settings for the cloud service in the Windows Azure Management Portal to change the tracing level for specific processes on the Windows Azure nodes (such as Microsoft.Hpc.Azure.AzureNodeManagerTracing).

Scenarios to store trace log data

The trace log files generated on Windows Azure role instances remain in local storage on the role instances as long as the role instances are running. However, if you want to access the data after a Windows Azure deployment is stopped or the nodes are deleted, you need to download or store the trace log files or data in persistent storage, such as Windows Azure storage, while the role instances are running. The following are scenarios to store trace log files or data.

Scenario 1: Enable automatic transfer of trace log files to Windows Azure blob storage

Starting with HPC Pack 2012 with SP1, the HPC cluster administrator can optionally enable the automatic transfer of trace log files from the Windows Azure compute or proxy nodes in a deployment to a container in blob storage (hpclogs) in the Windows Azure storage account for the deployment.

To enable automatic transfer of trace log files to blob storage in the Windows Azure storage account, in HPC Cluster Manager, on the Options menu, click Windows Azure Deployment Configuration. You can also set the AzureLogstoBlob HPC cluster property by using the Set-HpcClusterProperty HPC PowerShell cmdlet. You can choose to transfer logs for proxy nodes, worker nodes, or both. By default, transfer of log files blob storage is disabled. Changing the AzureLogstoBlob property only affects transfer of log files for future Windows Azure node deployments. The current deployments are not affected. For more information see Set-HpcClusterProperty.

Important

Saving Windows Azure deployment log files in blob storage uses storage space and generates storage transactions on the storage account associated with each deployment. If enabled, saving log files from worker nodes can affect the performance of all Windows Azure deployments that use the same storage account, especially if you have large deployments, or several concurrent deployments. The storage space and the storage transactions will be billed to your account. After you disable transfer of log files, the log files will not be automatically removed from Windows Azure storage. You may want to keep the log files for future reference by downloading them. The log files can be cleaned up by removing the hpclogs container from your storage account.

You can run the hpcazurelog command on the head nodes to download data from blob storage in the storage account to a local folder and to delete the files from blob storage. For more information, see hpcazurelog.

Scenario 2: Enable automatic transfer of trace log data to Windows Azure table storage

Starting with HPC Pack 2012, the HPC cluster administrator can optionally enable the transfer of trace log data from the Windows Azure nodes in a deployment to a Windows Azure diagnostics (WADSLogsTable) table created for this purpose in the Windows Azure storage account for the deployment.

To enable transfer of trace log data to the WADSLogsTable table in the Windows Azure storage account, set the AzureLoggingEnabled HPC cluster property to true by using the Set-HpcClusterProperty HPC PowerShell cmdlet. By default only Critical, Error, and Warning events in the log files are filtered for inclusion in the WADSLogstable table. Changing the AzureLoggingEnabled property only affects logging for future Windows Azure node deployments. The current deployments are not affected. For more information see Set-HpcClusterProperty.

Important

  • Logging of Windows Azure deployment activities uses table storage space and generates storage transactions on the storage account associated with each deployment. The storage space and the storage transactions will incur charges according to the terms of the Windows Azure subscription.

  • Logging to Windows Azure storage should generally be enabled only when problems occur with the deployment and to aid in troubleshooting issues with the deployment. After you disable logging to Windows Azure storage, the log data will not be automatically removed from Windows Azure storage. You may want to keep the logs for future reference by downloading them. The log entries can be cleaned up by removing the WADLogsTable from your storage account

Starting with HPC Pack 2012 with SP1, you can run the hpcazurelog command on the head node to download data from the WADLogsTable in the storage account to a local folder, and to specify the trace level of the data selected for storage in the table. For more information, see hpcazurelog.

Scenario 3: Manually retrieve and store log files from Windows Azure nodes

To facilitate further analysis, you can manually download log files from Windows Azure nodes to an on-premises computer, or upload them to a Windows Azure storage account.

Download log files

To download the log files, you can use one of the following procedures:

  • Run the hpcfile get command to download log files from each node individually.

  • Run a script that uses hpcfile get to download files from groups of worker nodes.

  • Use the Windows Azure Management Portal to connect remotely to each node individually. You can then copy the log file or files to a local computer.

  • Run the hpcazurelog command on the head node to download files from Windows Azure worker nodes or proxy nodes. This command was introduced in HPC Pack 2012 with SP1 and is not supported in previous versions. For more information, see hpcazurelog.

Note

  • To make a remote connection to a Windows Azure node, ensure that you configure Remote Desktop credentials in the Windows Azure node template.

  • To download the log files from HPC proxy nodes, you must make a remote connection to each node, and then copy log files individually to a local computer.

The following are example commands and scripts that use hpcfile get to download the log files from Windows Azure worker nodes. For more information about command syntax, see hpcfile.

Example 1. To download the trace log files, including possible overflow files, from the Windows Azure node AZURECN-001 on a cluster with an HPC Pack 2008 R2 with SP4 head node named myHeadNode to the current folder on the local computer, renaming the files to avoid overwriting files on the local computer:

hpcfile get /scheduler:myHeadNode /targetnode:AZURECN-001 /file:"C:\logs\hpcworker.log" /destfile:"worker001.log"
hpcfile get /scheduler:myHeadNode /targetnode:AZURECN-001 /file:"C:\logs\hpcworker.log.001" /destfile:"worker002.log"
hpcfile get /scheduler:myHeadNode /targetnode:AZURECN-001 /file:"C:\logs\hpcworker.log.002" /destfile:"worker003.log"
hpcfile get /scheduler:myHeadNode /targetnode:AZURECN-001 /file:"C:\logs\hpcworker.log.003" /destfile:"worker004.log"
hpcfile get /scheduler:myHeadNode /targetnode:AZURECN-001 /file:"C:\logs\hpcworker.log.004" /destfile:"worker005.log"
hpcfile get /scheduler:myHeadNode /targetnode:AZURECN-001 /file:"C:\logs\hpcworker.log.005" /destfile:"worker006.log"

Example 2. To download the hpcworker_000000.bin log files from the Windows Azure nodes in node group WorkerNodes with names beginning AZURECN on a cluster with an HPC Pack 2012 head node named myHeadNode to the C:\myFiles\myLogs folder on the local computer:

@echo off
set "extension=.bin"
set "fullfilepath=C:\myFiles\myLogs"
mkdir C:\myFiles\myLogs
FOR /F "tokens=1 delims="%%G IN ('node list /group:WorkerNodes ^| FIND "AZURECN-"') DO hpcfile get /scheduler:MyHeadNode /targetnode:%%G -file:"C:\logs\hpcworker_000000.bin" /destfile:"%fullfilepath%%%G%%%extension%"

Upload log files to the Windows Azure storage account

You can use one of the following procedures to upload the trace log files from Windows Azure worker nodes to a Windows Azure storage account:

  1. Download one or more log files to a local computer as described in the previous section, and then upload them to a Windows Azure storage account by running the hpcpack upload command.

  2. Run a script on one or more Windows Azure nodes that uses hpcpack upload to upload the log files directly to the storage account.

    Note

    • To run a script on a group of Windows Azure nodes, you can first upload the script from a local computer to the nodes.

    • As described in Scenario 1: Enable automatic transfer of trace log files to Windows Azure blob storage, starting with HPC Pack 2012 with SP1, you can enable automatic transfer of trace log files to blob storage in the Windows Azure storage account. However, if you are not using a version of HPC Pack that supports this capability, or you have not enabled automatic transfer of log files to blob storage, you can manually upload them to that location.

The following are example scripts that use hpcpack upload to upload the log files from Windows Azure worker nodes to the Windows Azure storage account. For more information about the command syntax, see hpcpack.

Note

Because log files on worker nodes are named identically, you should avoid overwriting files when you upload them to the Windows Azure storage account. For example, you can rename the log files with names that include the host name of the node, as shown in the following examples.

Example 3. To upload and rename the hpcworker_000000.bin files from Windows Azure worker nodes to the container MyLogs in the Windows Azure storage account named MyStorageAccount with a primary key named MyPrimaryKey

@echo off
REM Get the host name of the Windows Azure node
FOR /F "usebackq" %%i IN ('e:\approot\mpiexec.exe -c 1 hostname') DO SET filename=%%i
set "extension=.bin"
set "fullpath=C:\logs"
REM Consolidate the log file name (e.g., AzureCN-001.bin)
set "fullfilePath=%fullpath%%filename%%extension%"
REM echo:%fullfilePath%
REM Create a temporary file with desired name
copy C:\logs\hpcworker_000000.bin %fullfilePath%
e:\approot\hpcpack upload %fullfilePath% /account:MyStorageAccount /container:MyLogs /key:MyPrimaryKey
del %fullfilePath%

Example 4. To upload a script Uploader.bat (similar to the script in Example 3) from the head node to a container named MyContainer in a Windows Azure storage account named MyStorageAccount, download the script to Windows Azure nodes in the node group named WorkerNodes, and then run Uploader.bat on the nodes in WorkerNodes:

hpcpack upload uploader.bat /account:MyStorageAccount /container:MyLogs /key:MyPrimaryKey
clusrun /nodegroup:WorkerNodes hpcpack download uploader.bat /account:MyStorageAccount /container:MyLogs /key:MyPrimaryKey /path:c:\logs
clusrun /nodegroup:WorkerNodes c:\logs\uploader.bat
clusrun /nodegroup:WorkerNodes del c:\logs\uploader.bat

View log data in a Windows Azure storage account

To view logs that are in Windows Azure table or blob storage, you can browse storage by using the Windows Azure Tools for Visual Studio. For more information, see Browsing Storage Resources with Server Explorer.

You can also use a non-Microsoft tool such as Azure Storage Explorer.

See Also

Concepts

Burst to Windows Azure with Microsoft HPC Pack hpcazurelog