Monday, August 30, 2021

E-commerce Bot Economics


What does supply and demand have to do with bots?

For this post, I am not talking about bots that are performing attacks such as SQLi or Account Takeover (ATO). This post will strictly explore the grey area of web scrapping and cart checkout based bots. For folks that have studied economics, you are likely familiar with tried and true supply and demand curve. A quick refresher can be found on wikipedia,

Xbox and PS5 bots

There are a number of tools that exist for going through a checkout process to buy products. The bot operators typically fall into a few different buckets:

The motivation for an organization to buy thousands or even hundreds of thousands of PS5s is because... drumroll.... MONEY! There is an enormous opportunity to buy low and sell high for this highly sought after product.

Into the economics

There is a substantial delta between what customers are willing to pay for a product and what the product is initial sold for. In a free marketplace, the high demand would be signal to charge more to customers for the product. Charging a higher sticker price allows demand to go down to meet supply and ultimately result in higher profits for a business. More importantly, this removes the profit margin that can exists in secondary markets. A bot management solution can be a helpful tool for raising the barrier to entry for a bot operator. However, if there is enough of a margin between the price of the product that the primary seller and the price that the end consumer is willing to pay, then bot operators will find a way around any bot management solution. The ultimate solution is to close the gap between the primary market and any secondary or end consumer tooling markets.
Primary market sellers cannot always raise prices

There are a number of marketplaces where the primary sellers cannot raise prices. For example, with ticketing platforms the ticket prices are often set by the artists that will be performing. This results in many artists that will set ticket prices substantially below market value with the hope that "real fans" will be able to buy the tickets. When scalpers purchase the tickets and then sell them at the actual market value, then the fans get frustrated with the ticket selling vendor. The bot operators then resell the tickets at a market rate on a secondary market with a healthy profit. The artists are then able to deflect any criticism about ticket prices onto the ticket selling vendor for not implementing sufficient ticket scalping measures.

What is the solution?

The real solution to this problem is an economics solution. There are 2 options. Either increase supply or reduce demand. To continue with the venue ticketing example, increasing supply is a simple concept. If artists do substantially more shows so that more seats are available, then supply goes up. This is a simple concept, but not easy or desirable to implement since it requires much more work for the artists and supporting staff for less of a return. This is the approach that Kid Rock took. I would encourage everyone to listen to this podcast for details on that use-case, The other approach is to reduce demand, which is also an easy concept. Taylor Swift did by raising her ticket prices. The following link contains those details,

These solutions are great if the seller has the ability and will to make these changes, but often the seller has a deficit in one of these areas. As a consequence, sellers are forced to examine how to increase the barriers to entry to make it less profitable for secondary markets to operate. An effective bot management solution and identity management solution can be a critical piece in increasing the costs for secondary markets. However, if the gap between the price customers are willing to pay and the available supply remains wide, then secondary markets will remain profitable.
Where can this mindset be applied?

I would argue that the true customer for companies that sell products at below market value is the supplier. The initial sellers need the ability to charge more for these high demand products, or they will be forced to add more friction in the buying process to try to combat the secondary market. This economics approach may be considered for really any highly sought after product. This includes physical products like shoes (sneaker bots), baby cloths, GPUs, airline tickets, and PS5. 

Bot Operator Tooling

The tooling that is used to buy the product from the primary marketplace can vary from basic scripts using items like the following:

  •     Browser plug-ins
  •     Python Requests and Beautiful Soup
  •     Selenium
  •     Anti-captcha solutions to defeat captchas
  •     Residential and mobile network proxies


Tuesday, December 8, 2020

Why is referrer missing in Google Analytics?

Learn from my mistakes and save some time. I spent way more time on this than needed. If your referrer is missing from you Google Analytics and you use Google Tag Manager, then make sure you set the field alwaysSendReferrer to true per the screenshot below.

This field is necessary to always send the referrer as the dr parameter in web requests to Google Analytics. For more information on the referrer, see the following link. 

Have a great day!


Monday, September 28, 2020

Chromedriver really wants to use the default profile

Today I learned that when using chromedriver and trying to use an existing Chrome Profile that the existing Selenium or chromedriver package will attempt do one of the following:

1. If "default" exists, then that directory will be used
2. Selenium will create a "Default" directory unless a profile directory is specified.

Take for example the following snippet:

from selenium import webdriver 
chromeOptions = webdriver.ChromeOptions()
chromeOptions.add_argument("--user-data-dir=/Users/myuser/Library/Application Support/Google/Chrome/Profile 2")

browser = webdriver.Chrome(options=chromeOptions)

return browser

 The first time you would run the above, it will not work. However, I did see that the new directory "/Users/myuser/Library/Application Support/Google/Chrome/Profile 2/default" did exist with all of the expected chrome settings of a brand new chrome profile. To make my profile work with Selenium, I needed to make a small update.

from selenium import webdriver 
chromeOptions = webdriver.ChromeOptions()
chromeOptions.add_argument("--user-data-dir=/Users/myuser/Library/Application Support/Google/Chrome/")
chromeOptions.add_argument("--profile-directory=Profile 2")

browser = webdriver.Chrome(options=chromeOptions)

return browser

I had to specify the specific profile directory name using the "--profile-directory" argument. The correct profile starting working after I made the change. 

Happy automating!

Sunday, February 23, 2020

Why consistent tooling matters

I have always known that there is a lot of tribal knowledge within the team that I am a part of. I also understand that the silo'ed knowledge is undesirable, but until recently I did not really see the extent where this can be problematic. We become too comfortable with going through inefficient workflows to solve problems because that is what we are used to. Tooling could be built to make the workflows more efficient, but the cost in time to build out the tools compared to going through the sub-optimal workflows was not worth paying in the moment. It was much easier to just train people on the nuance of how the systems worked and then demonstrate the sub-optimal way to complete the task. However, this methodology fell apart and was not scalable as it became necessary to aggressively on-board new engineers.

Once it became a requirement to on-board product support engineers to the product that I was helping maintain, I was essentially starting from the ground up with a bunch of new hires for a geographically dispersed team. The existing product support team members are tech savvy, but the technology that I had been supporting works very different from anything that they had worked with previously. As I was doing some of the training and explaining how to troubleshoot cases I would often get the question "How do you know to look at x system when y problem happens?". I was really bothered by these types of questions because these were largely undocumented Product quirks that the the existing Support team and myself would stumble upon and remember the pain of troubleshooting the low level mechanics of what could be causing x issue.

It is important for team members to all to use the same methodology when troubleshoot similar cases. The lack of tooling created a per-requisite to understand the low level details of systems before they could even begin to understand the approach to solving technical issues. Without consistent tooling there was a wide variety of process and methodology for solving cases. Everyone was left to their own devices to figure out the what they could see as the best way to solve cases. Useful tooling does a few things:
  1. Provides mechanisms for troubleshooting specific issue
  2. Provides a shared language for how issues have been troubleshot and what troubleshooting to do next.
  3. Reduces on-boarding time
I will go through each of the 3 items above since they require their own more in-depth explanation.

Provides mechanisms for troubleshooting specific issue

PKI is very large topic with many complex pieces. It is a common task for Support Engineers to check the validity of a certificate. My preferred way of checking the validity of a certificate is to use tooling in Openssl. Below is the function for reference:

function checkcert {
echo "---------------------------"
echo "Certificate Valid Dates"
echo "---------------------------"
true | openssl s_client -connect $1:443 2>/dev/null | openssl x509 -noout -dates
echo "---------------------------"
echo "Certificate CN and DNS Info"
echo "---------------------------"
true | openssl s_client -connect $1:443 2>/dev/null | openssl x509 -noout -text | grep DNS:
echo "Issuer"
echo "---------------------------"
true | openssl s_client -connect $1:443 2>/dev/null | openssl x509 -noout -issuer
echo "---------------------------"
echo "Check Hostname Validity"
echo "---------------------------"
true | openssl s_client -connect $1:443 2>/dev/null | grep 'Verify return code'

If a support engineer has to validate a certificate, then it is as easy as running checkcert from the command line to check the validity. I am assuming that SNI is not a factor for this example. There is no need to open a browser to grab a screenshot of text. The output of the command already provides the most relevant information for identifying the validity of a certificate in an easy to consume and share format. Having information formatted in a way that is easy to consume is a critical part to the next point.

Provides a shared language

IT systems are complex and the roles between organizations are highly specialized. Having a clear and consistent way of communicating between people will result in issues being fixed more quickly. Using similar tooling builds trust and confidence when transitioning work items from individual to another in a team. Also, there is less of a need to explain the steps that have been taken so far to troubleshoot a behavior when methodology that is followed is consistent among a team. Just like a real language, the shared and consistent follow through can feed into how well interaction go with external teams as well. For example, if it is expected that logs will be formatted a certain way when requests are sent to the Product team, then tooling can be built to accommodate this need. By building out the tooling to format logs in a more consumable way, this reduces the amount of time needed to manually do the formatting and also reduces the likelihood that the incorrect format will be used. This is especially important if using the incorrect format may cause delays by unnecessary back and forth on the tickets.

Reduces on-boarding time

The previous points really feed into this last item, which I believe is the most important. Turnover is a part of any company and no one should attempt to completely eliminate turnover. Instead, focus on creating a great on-boarding program and reduce the amount of time necessary for someone to become proficient at their key responsibilities. For example, if I told an entry level new hire "Openssl is the only acceptable way to do certificate validation and if you have questions then check out the 'man' page", then I am failing as someone responsible for on-boarding. This is because the goal is to do certificate validation, not become an expert at Openssl (which is still a good skill to have). We have to think about how to abstract the low level details where it makes sense so that more time and energy may be spent on hire level tasks that add value to the company and customers.

I hope you enjoyed the post. Please leave questions and feedback in the comments below.