Saturday, September 15, 2018

Resilient Storefront Optimal Gateway Routing with GSLB

Pre-Requisites Before reading further, if you don’t know what StoreFront Optimal Gateway Routing (OGR) is, STOP and check out: If you need a refresher on DNS or NetScaler GSLB, then STOP and review eDocs and relevant CTX article and resources. For information on NetScaler GSLB, see the links below: And if you want to see how DNS works in your own environment, then check out one of my favorite ways to tools to troubleshoot and learn about DNS with http://digwebinterface.com/. Try resolving your own site with the “trace” option checked and unchecked. Abstract The purpose of this blog is provide an overview on how GSLB can be used to provide a redundant solution with StoreFront Optimal Gateway Routing (OGR). A few questions will be answered in this blog for a multi-datacenter design.
  1. How can we use a single URL for external access?
  2. How can we send users to specific datacenter where users’ unique data or backend application dependencies reside?
  3. How can we deploy a resilient solution to protect against a datacenter outage?
Let’s get into it Acme is our example customer. Acme has three (3) datacenters, NY, LA, and Atlanta (ATL). User’s on the East coast have non-persistent desktops, but have their roaming profile on a file server at NY. User’s on the West coast have non-persistent desktops, but have their roaming profile on a file server at LA. All users have access to a unique application that has backend requirements at the ATL datacenter. So the answer to question number one is fairly straight forward. Use GSLB with NetScaler Gateway. The common external FQDN can be “access.acme.com”. Three (3) NetScaler Gateway VIPs will reside at each datacenter and will be included in the GSLB configuration. If we didn’t care where we sent users, then we could stop here. However, with the scenario outlined above, we want to avoid scenario where a NY user (user data is in NY) launches a desktop that is proxied by the LA Datacenter. This user would then have their Citrix session go from client’s locations –> LA NetScaler Gateway –> NY datacenter, which is certainly not the optimal route. More importantly this routing will utilize precious private site-to-site bandwidth and could be detrimental to a user’s experience. This is where OGR comes into play. Let’s answer question #2. How can we send users to a specific datacenter where their unique data or backend application dependencies reside? We want to use site prefixes to make each site unique. For those of you already thinking ahead, yes, a SAN certificate is required for this solution. Below are the prefixes we will use for the example:
  • NY NetScaler Gateway VIP = ny.access.acme.com
  • LA NetScaler Gateway VIP = la.access.acme.com
  • ATL NetScaler Gateway VIP = atl.access.acme.com
With OGR and under normal work conditions we can direct users accessing a XD Site at NY will be proxied by the NY NSG (ny.access.acme.com), users accessing a XD Site at LA will be proxied by the LA NSG (la.access.acme.com), and users accessing the unique application (AutoCAD) will be proxied by the ATL NSG (atl.access.acme.com). With this solution, users are able to authenticate at any site and launch applications that will utilize the public WAN to cross the nation, instead of using potentially costly MPLS connections. How are users able to authenticate at NY, but still able to launch apps from ATL? Using STAs of course! All NetScaler appliances in the environment will need to be able to communicate with all of the same STAs. For information on STAs, please see: http://support.citrix.com/article/CTX101997. With this configuration, authentication and application enumeration are separate events from application launch. It is key to understand that fact. Authentication can occur anywhere, but application launch is more granularly specified with unique site prefixes and OGR. So let’s answer question # 3 and add some resiliency. How can we deploy a resilient solution to protect against a datacenter internet outage? What happens when a construction company’s backhoe accidently severs Acme’s internet POP in LA while laying down city infrastructure, but Acme’s MPLS connection remains intact? A unique GSLB vServer exists for each of the site unique prefixes. A separate GSLB vServer also exists for “access.acme.com”. Configuring the “access.acme.com” vServer as a backup vServer for all of the GSLB vServers with the site prefix will protect the individual and unique FQDNs against a datacenter failure. For example, when the LA datacenter’s internet connection is broken, the NetScaler appliances at NY and ATL will recognize an outage via either MEP or explicit health monitors. Users are then sent to the available NY and ATL NSG when resolving “la.access.acme.com” and “access.acme.com”. Users can then be proxied through the internal MPLS via the available sites. If the MPLS (or other private site to site connection) went down, then StoreFront can be configured with DR (http://support.citrix.com/proddocs/topic/dws-storefront-26/dws-configure-ha-lb.html), but we will save that talk for another day ;-). I have included some diagrams to help clarify things. A key things to keep in mind is that authentication and application launch are two completely separate events and workflow. The diagram below is for the authentication and application enumeration workflow.   Blog_Auth_Workflow_01   The diagram below illustrates the application launch workflow. The thick lines represent normal working conditions. The dotted lines represent the backup workflow in the event that site is experiencing an outage.   Blog_App_Launch_01   Thank you for reading. I hope you found this beneficial. Please let me know if you have questions in the comments below. BC  

Customize your monitoring with the XenDesktop Director API and Python

On a day-to-day basis I assist with the operations of a Citrix environment with 100+ individual XenDesktop sites (small offices). With Director, only a single site is visible at a time. I would have to select each site individually to find out if there are any failure events at a location. For 100+ sites this would be extremely tedious and time consuming. Wouldn’t it be great if there was a way to look through all unique sites and find out if there’s a failure? Heck yeah it would!

Our Solution

What I did was create a Python script that does just that. The script consumes a text file with a list of all of the XML Brokers and asks each broker “What is your current your failure count?”. If the failure count is greater than 0 (zero), which means there’s a failure, then open up IE and navigate to that site for further investigation. The way the script is written, it will iterate and re-iterate through the list until the script is manually stopped. This way I can leave it running all day and if there’s an issue IE will pop up prompting me to login, and pause for 30 seconds. Director-Logon   If no issues are found at the queried XML Broker, then the script will wait 5 seconds and move to the next XML Broker in the list. If you need to manually stop the script, then use CTRL+C or just close the window where the application is being executed. When an error is found, then I can logon on to the Director server and begin troubleshooting. Director-Error Each query is targeted at the URL “/Citrix/Monitor/OData/v2/Data/FailureLogSummaries”. The first XML tag is what the script is looking for because it contains the current failure count. For documentation on what information is contained in this field, please visit eDocs http://support.citrix.com/proddocs/topic/xendesktop-7/cds-ms-odata-wrapper.html. The location of the text file for me is “D:\temp\ddcFile.txt”. You may modify the variable “ddcFile” to your specific file location. The file lists DDC as such. ddc1.mycompany.net ddc2.mycompany.net ddc3.mycompany.net Here is the Python code is below.
import requests
import time
import xml.etree.ElementTree as ET
import requests.auth
from requests_ntlm import HttpNtlmAuth
import getpass
import webbrowser

#use this for username\password

username = raw_input("Enter Domain\\Username :")
password = getpass.getpass("Enter Password :")

#xml namespaces
ns = {'default': "http://www.w3.org/2005/Atom",
    'base': "http://192.168.0.112/Citrix/Monitor/OData/v2/Data/",
    'd': "http://schemas.microsoft.com/ado/2007/08/dataservices",
    'm': "http://schemas.microsoft.com/ado/2007/08/dataservices/metadata"}

#Used for automatically launching IE when a failure is detected.
ie = webbrowser.get(webbrowser.iexplore)

class DdcMonitor:
    def main(self):
        while True:
            #opens the file with the list of DDCs
            ddcFile = open('D:\\temp\\ddcFile.txt', 'r')
            for ddcFQDN in ddcFile:
                #uses HTTP. HTTPS could be added if needed.
                directorURL = "http://" + ddcFQDN.rstrip("\n") + "/Citrix/Monitor/OData/v2/Data/FailureLogSummaries"
                print("Now probing : " + str(directorURL))
                #Connection information
                #here is an example of a constructed query
                #directorURL = "http://192.168.0.112/Citrix/Monitor/OData/v2/Data/FailureLogSummaries"
                directorSession = requests.session()
                directorSession.auth = HttpNtlmAuth(username,password)
                directorReqData = directorSession.get(directorURL)

                #XML information
                root = ET.fromstring(directorReqData._content)
                entry = root.find('default:entry', ns)
                sub_1 = entry.find("default:content", ns)
                for sub_2 in sub_1.find("m:properties", ns):
                   if "FailureCount" in str(sub_2.tag):
                       if int(sub_2.text) > 0:
                           print("")
                           print("The Failure Count is increasing at " + directorURL)
                           print("The error count is currently :   " + sub_2.text)
                           print("Waiting 30 seconds")
                           ie.open('http://' + ddcFQDN + "/director")
                           print("")
                           time.sleep(30)
                       else:
                           print("The Failure Count is not increasing at " + directorURL)

                print("The probe will run again in 5 seconds")
                print("")
                print("")
                time.sleep(5)

            ddcFile.close()
            time.sleep(1)

try:
    DdcMonitor().main()
    print("the program is no longer running")
except:
    print("Something caused the program to stop. Please restart the program")
Is there any information that you would like to monitor from your Citrix deployment on an hourly, daily, or weekly basis? Let me know in the comments below. Thanks for reading! BC This software / sample code is provided to you “AS IS” with no representations, warranties or conditions of any kind. You may use, modify and distribute it at your own risk.

Troubleshooting Cryptic NetScaler “Internal Error” When Installing or Replacing a Certificate Key Pair

Abstract The purpose of this blog is to assist with troubleshooting the “Internal Error” error when installing or replacing certificate key pairs. Let’s dive in No matter how many times you click the “install” button the certificate key pair just will not install. Sound familiar? I’ve encounters this issue, many, many, many times. The root cause is always one of the following:
  1. The certificate or private key are formatted improperly.
  2. The certificate does not correspond to the correct private key.
Let’s go into each of these. 1. The certificate or private key are formatted improperly. Sometimes extra spaces can throw off the import of a certificate/key pair. Personally, I try to put all certificates into a PEM format. NetScaler OpenSSL can be used to accomplish this. For keys use the following command. Use the “–inform DER” option for DER encoded keys.

Openssl rsa –in -out

Along the same lines, use the following command to reformat the certificate file:

Openssl x509 –in -out

Try installing the certificate/key pair again. If that doesn’t work then… 2. The certificate does not correspond to the correct private key. I encounter this much more than people would think. If the certificate was generate from a CSR that was not signed by the correct private key, then the certificate will never install. Ever. How can we check if the certificate was generated from a CSR that was signed by the correct key? By checking the modulus of the private key, CSR, and certificate of course :-)! The modulus of the private key and certificate must be 100% exact matches. For an explanation as to why this is the case, please seehttps://en.wikipedia.org/wiki/RSA_(cryptosystem). To view the modulus of a private key, use the following command. Openssl rsa – modulus –noout –in To view the modulus of a csr:

Openssl req – modulus –noout –in

To view the modulus of a certificate:

Openssl x509 –modulus –noout –in

Below is a sample output for what the modulus will look like for my public certificate. NetScaler_modulus FIPS NetScaler Appliances So what happens if you can’t access your private key to run OpenSSL commands (FIPS NetScaler appliances)? Remember that the Key, CSR, and Cert MUST all use the same modulus if they are related. With this theory in mind, you can generate a bogus CSR off of the FIPS key to see what the modulus should be for the public certificate. If you generate a CSR off of a FIPS box and the modulus for that CSR does not match the modulus for the Public Certificate that was returned to you, then that certificate did not use a CSR that was generated off of that FIPS key. I hope this information has helped. The OpenSSL commands I have listed are only a handful of my favorites. For a really good cheat sheet on useful commands (I have it bookmarked), check out the link below. https://www.sslshopper.com/article-most-common-openssl-commands.html Let me know if you have questions in the comments below! BC

Prevent a DOS via user lockouts at NetScaler Gateway

Before we begin let me first say… All NetScaler Gateway landing page customizations are unsupported. Changing the NetScaler Gateway landing page will cause you to have an unsupported environment. I do not condone malicious attempts to lockout user accounts. The purpose of this article is to highlight a current risk and mitigation steps. Now that the disclaimer is out of the way. Let’s start with the customizations :-). The current recommended configuration for two-factor authentication at NetScaler is available here. http://support.citrix.com/article/CTX125364 With the configuration highlighted in the article above. Web based users that authenticate are hitting AD first. Ideally, we would want to follow the authentication workflow that is configured for the Native Receiver. The Native Receiver evaluates RADIUS first, and if this is successful, then the LDAP policy is invoked. What is the risk of leaving the configuration exactly how the article has outlined the configuration? If Bob, a malicious user, knows Alice’s username, then Bob could enter a bogus password 3 times and lock Alice’s account. Bob could do this as often as he liked until some measure went into place to stop Bob. If Bob knew a lot of usernames and had some knowledge of scripting tools such as JMeter, then he could lockout a large number of user accounts effectively acting as a DOS. This would be bad, and I again, I would not condone such an attack.  So what can we do to mitigate such a risk? The quick and easy way to do it is to reverse the web authentication policies so that they match up with the Native Receiver (RADIUS as primary, LDAP as secondary). However, this will force users to enter their RADIUS passcode before entering their AD username. Most organizations want to have the dynamic pin as the 2nd password for users to enter. So how can we mitigate the risk AND have the dynamic token as the second password users need to enter? Like in the quick and easy method, we would need to make the RADIUS authentication primary and the LDAP authentication secondary. Now we need to customize some JavaScript on the NetScaler. The file /vpn/login.js is what we need to customize. This file can be found under“/netscaler/ns_gui/vpn/login.js”. What we will do is change the ordering of the POST values. The JavaScript below has the original values in red that we will change.
function ns_showpwd_default() { var pwc = ns_getcookie(“pwcount”); document.write(‘’ + _(“Password”)); if ( pwc == 2 ) { document.write(‘ 1′); } document.write(‘:’); document.write(‘passwd” size=”30″ maxlength=”127″ style=”width:100%;”>’); if ( pwc == 2 ) { document.write(‘’ + _(“Password2″) + ‘ passwd1” size=”30″ maxlength=”127″ style=”width:100%;”>’); } UnsetCookie(“pwcount”); }
  The JavaScript below contains the revised fields so that when a user POSTs their credentials, NetScaler will can evaluate RADIUS before attempting to contact AD. The values passwd1 andpasswd are swapped.  
  function ns_showpwd_default() { var pwc = ns_getcookie(“pwcount”); document.write(‘’ + _(“Password”)); if ( pwc == 2 ) { document.write(‘ 1′); } document.write(‘:’); document.write(‘passwd1” size=”30″ maxlength=”127″ style=”width:100%;”>’); if ( pwc == 2 ) { document.write(‘’ + _(“Password2″) + ‘ passwd” size=”30″ maxlength=”127″ style=”width:100%;”>’); } UnsetCookie(“pwcount”); }
With this configuration, we can remove an avenue for would-be attackers who intend to lockout users. Also, below are some relevant links for NetScaler Gateway customizations. http://support.citrix.com/article/CTX125364 http://support.citrix.com/article/CTX126206 http://support.citrix.com/proddocs/topic/netscaler-gateway-101/ng-connect-custom-theme-page-tsk.html Have you worked at an organization that has come under attack from user lockouts? What have you done to mitigate the threat? Let me know in the comments below and feel free to ask questions! Thanks for reading, BC