Thursday, January 18, 2024

Considerations on MSS when using a GRE tunnel from a CDN to a CDN origin


When a GRE (Generic Routing Encapsulation) tunnel is used between the CDN and the CDN origin, it can impact the Maximum Segment Size (MSS) of the TCP connections due to the additional overhead introduced by the GRE encapsulation. Here's how it works:

GRE Tunnel Overhead

  • GRE encapsulation adds additional headers to the packets being transmitted. A GRE header typically adds 24 bytes (20 bytes for the GRE header and 4 bytes for the GRE key, if used).
  • This extra overhead reduces the amount of space available in each packet for the actual payload data.

Impact on MSS

  • The MSS in a TCP connection signifies the largest amount of data, in bytes, that a computer or communications device can receive in a single TCP segment. It does not include the TCP header or IP header.
  • Under normal circumstances without a GRE tunnel, the MSS is calculated based on the Maximum Transmission Unit (MTU) of the network path, typically 1500 bytes for Ethernet networks. From this, the IP header (20 bytes) and TCP header (typically 20 bytes) are subtracted, resulting in a default MSS of 1460 bytes.
  • With a GRE tunnel, the additional GRE header needs to be accounted for. If the network path's MTU remains at 1500 bytes, the MSS must be reduced accordingly to accommodate the GRE header. For example, with a 24-byte GRE header, the MSS would need to be reduced to 1436 bytes (1500 - 20 for IP header - 20 for TCP header - 24 for GRE header).


  1. Fragmentation:

    • If the MSS is not adjusted for the GRE overhead, packets might exceed the MTU, leading to fragmentation. Fragmentation can reduce performance and increase the likelihood of packet loss.
  2. Reduced Efficiency:

    • A smaller MSS means that more packets are required to send the same amount of data. This can lead to increased overhead and reduced efficiency in data transmission.
  3. TCP Performance:

    • TCP performance might be impacted due to the reduced MSS. TCP throughput can be less efficient with smaller packet sizes, especially over long-distance links where latency is a factor.

Adjusting MSS

  • Network administrators can adjust the MSS using TCP MSS clamping, a technique used in routers and firewalls to modify the MSS value within the TCP SYN packets. This ensures that the end devices establish connections using an MSS that takes into account the GRE overhead, preventing issues like fragmentation.

In summary, using a GRE tunnel between the CDN and CDN origin impacts the MSS due to the additional header overhead of the GRE encapsulation. This necessitates an adjustment of the MSS to avoid fragmentation and maintain efficient TCP performance. Proper network configuration, including MSS clamping, is essential to handle this change effectively.

Saturday, July 2, 2022

Staying logged into a web site when using puppeteer

Do you want to stay logged into a site after manually logging in with puppeteer?

This task is not as easy as just specifying the user data directory based on feedback from the following Github thread.

The work around is to save cookies after logging in and then load those cookies in subsequent sessions. Here is the answer from Github.

Saturday, June 25, 2022

What is trust in cyber security?

What is trust? The philosophy of trust is fascinating. The definition of trust is "Assured reliance on the character, ability, strength, or truth of someone or something".

Where does the trustor's belief come from? My point of view is that the trustor must have a past experience or foundational belief on behalf of the trustor to inform the level of trust or distrust the trustor will have in the trustee. Without some prior knowledge to an interaction, there is likely to be no foundation for trust or distrust. Every adult human on the face of the earth will have a weighted perspective on how much they believe a trustee will act in a way that is beneficial to the trustor.

So far this post has been a very philosophical discussion. Why am I even talking about the concept of trust? In information security, trust is the bedrock for nearly interaction (or at least it should). When a person attempts to redeem a gift card online, the web application owner assumes that if valid card details are entered, then that must be the owner of the gift card. The online shop will then add the balance of the gift card to the person's account. It would be amazing if the world was this simple, but online fraudsters will take advantage of this inherent trust. Fraudsters can brute force gift card validation endpoints using automation (aka bots) to redeem balances on gift cards.

How can a website owner distinguish between legitimate users and fraudsters for these cases?

There are a number of identifiers that are available with a given request including:

  • Source of the traffic. (IP and Geo IP)
  • Owner of the source traffic. IP ASN Owner and OrgName.
  • HTTP request headers
  • Rate of traffic

This is not an exhaustive list and there are many sub categories of these different fields that could be utilized as well. This is especially true with rate of traffic. If there are a million requests being sent within a 5 minute time frame from a single user, then that would likely be considered abusive or fraudulent by most web applications. These 5 million requests would not be considered trustworthy since it vastly exceeds the normal user usage. However, there are applications where this may be acceptable behavior. What qualifies as "normal" requires some prior knowledge or definition to define what "normal" really is.

We cannot infer trust in a vacuum. We must rely on prior knowledge to guide if something or someone is trustworthy. As we continue to progress in cyber security to fight fraud, it will be interesting to see how an individual's history is recorded for good or bad behavior.

Here are some questions that I have for maybe a later past.

If my online persona violates the ToS for a site, then does that get recorded somewhere? Should it? When should it be a requirement for my real or true identity to be used for interacting with a site instead of having the ability to use a persona?

Monday, December 20, 2021

Risk of incorrectly classifying people as bots

I really like two different shows. The Netflix show Black Mirror illustrates various dystopian futures and Reply All is usually an upbeat show that reports on a wide spectrum of topics.

Recently Reply All put out the show State of Panic where the critics of a Florida politician were the recipients of varying degrees of unwanted attention to put it lightly. The large volume of unwanted attention came in the form of undesirable direct messages (DMs) and tweets directed at the critic. The large volume of messages had very focused messaging which could give the impression that the communication was being driven by a small set of individuals with a large number of bot operator managed Twitter accounts. However, after doing some investigation by the good folks at Reply All, it was identified that there were actually a number of very zealous Twitter followers of the politician. The Florida politician, while having some controversial beliefs, had deeply connected with many people online.

Below is the short synopsis the Men on Fire episode of Black Mirror from Wikipedia.

The episode follows Stripe (Malachi Kirby), a soldier who hunts humanoid mutants known as roaches. After a malfunctioning of his MASS, a neural implant, he discovers that these "roaches" are ordinary human beings. In a fateful confrontation with the psychologist Arquette (Michael Kelly), Stripe learns that the MASS alters his perception of reality.

For the soldier, it is much easier to eliminate the "roaches" than to eliminate real people. Once the soldier becomes aware that his actions are impacting people instead of "roaches", he starts to empathize and question his overall mission. In the case for Reply All's State of Panic episode, recipient of the harassment viewed the messages as originating from bot managed accounts since it was difficult to believe that so many real people would have the specific set of beliefs. While harassment is certainly not helpful, we still have to acknowledge that these are real people's voices. When we incorrectly classify public web discourse as bot traffic, we are minimizing the view points of those individuals. The view points may be misguided or factually incorrect, but they still are the perspectives of those people. 

Large numbers of fake accounts that are managed by a small set of individuals can absolutely be a problem. Platforms should take steps to reduce the influence of fake accounts where possible. However, we should also avoid the knee jerk reaction of classifying controversial opinions as originating from "bots". Otherwise, we will not understanding the view points of a large population of individuals.