Security and coding tutorials by Darius Pirvulescu

SSE, Clerk, and rotating tokens: A debugging story

2026-02-17T00:00:00+00:00

This is the story of a subtle workflow error that polluted our logs and kept triggering pointless alerts. It lingered in the background long enough. I decided to track it down and fix it for good. Enough is enough.

In my client’s system, we rely on Server-Sent Events (SSE) to stream web notifications to active users on our client apps. Something along this flow triggered 401 Unauthorized errors constantly.

The SSE workflow

Client app - opens a connection (type text/event-stream) to receive these notifications
Server - authenticates the user
subscribes it to the messages stream
sends events (messages and/or data) to the client
Connection’s lifetime ends when either the client disconnects or the server cancels the connection

As our auth provider, we use Clerk. Our frontend apps, built with Vite, are implementing the Clerk SDK (@clerk/clerk-react).

The problem

At certain intervals during this process, the SSE requests failed with 401 Unauthorized. When users were active on the app, the number of failures compounded. This triggered the alerts we configured in the Azure Monitoring portal, and those alerts were sent to our Slack. It became annoying as they clouded other real alerts from our system.

And the Azure logs:

The symptoms

I noticed the first request failed, followed by successful requests. What made that first request fail? I looked at the difference between them.
A few symptoms started to stand out in those failed requests:

The request type was plain instead of eventsource
The bearer token was different. This was a tell-tale sign, as the new token didn’t change in the successful requests.
The failed one seemed like it got triggered once the app was accessed (switched tab to it)

This last symptom had me thinking, it probably is related to an event firing. While digging around DevTools for clues, I checked at the stack trace. Comparing the stack traces from the two requests offered plenty of clues on what went wrong.

The stack trace from the 401 requests:

create
fetch.ts:105:40
onVisibilityChange
fetch.ts:145:9
(Async: EventListener.handleEvent) fetchEventSource/<
fetch.ts:67:12
fetchEventSource
sse.ts:64:13
connect
RealtimeNotificationsProvider.tsx:122:22
initializeSSE

And the one resulting in 200:

create
fetch.ts:105:40
fetchEventSource/<
fetch.ts:145:9
fetchEventSource
fetch.ts:67:12
connect
sse.ts:64:13
scheduleReconnect/this.reconnectTimeout<
RealtimeNotificationsProvider.tsx:122:22
(Async: setTimeout handler) scheduleReconnect
RealtimeNotificationsProvider.tsx:125:10
onerror
react-dom-client.development.js:25989:20
create

I noticed the onVisibilityChange was right before the 401 error. This shines some light as it might be related to the browser visibility event.

After googling for “onVisibilityChange fetch”, among the first results, I found https://github.com/Azure/fetch-event-source#readme, the same library we’re using handling SSE.
In their README:

In addition, this library also plugs into the browser’s Page Visibility API so the connection closes if the document is hidden (e.g., the user minimizes the window), and automatically retries with the last event ID when it becomes visible again

Aha! Just like in the symptom I observed, it connects when the app comes into focus. So while the user is not looking at the app (so to speak) the connection is closed. As soon as the app is visible again, it retries to connect.

The code

It was still not very clear why the authorization failed. Time to paste the code here. Well, parts of the code:

 async connect(): Promise<void> {
    if (this.isConnecting) {
      return;
    }

    this.isConnecting = true;

    try {
      const token = await this.options.getToken();
      if (!token) {
        this.isConnecting = false;
        return;
      }

      [...]

      await fetchEventSource(this.options.url, {
        method: 'GET',
        headers: {
          Authorization: `Bearer ${token}`,
          Accept: 'text/event-stream',
        },
        signal: this.abortController.signal,
        onmessage: event => {
          [...]
        },
        onerror: error => {
          // Only reconnect if not aborted (user-initiated disconnect)
          [...]
        },
      });
    } catch (error) {
      // Only reconnect if not aborted (user-initiated disconnect)
      [...]
    } finally {
      this.isConnecting = false;
    }
  }

I kept experimenting until I noticed the token gets cached. We pass the headers to fetchEventSource, and those same headers are reused when the visibility change, they’re passed down to the library’s internal create() call.

So we pass the valid token for SSE and it works well. But it eventually fails, and the token gets changed by then.
Can the token be rotated while we “look away” (the app in the background)? I went through the Clerk docs and finally I found the root cause.

The bug

Clerk issues tokens with very short lifespan (that’s 60 seconds short). And uses a token refresh mechanism that is triggered automatically by the frontend SDKs.

This and the token being cached was the cause of us getting all those 401 Unauthorized errors.

SSE connects with token A (valid) - the token is passed to fetchEventSource
User switches tabs - the library closes the connection
Clerk rotates the token - token A expires, replaced by token B
User returns to the tab - the internal onVisibilityChange handler fires and tries to reconnect while re-using the same static headers object from step 1. It sends the expired token A
Our server returns 401 (token A is expired)
The 401 triggers our onerror which eventually calls getToken(). This refreshes the token B

The fix

I needed to make sure fetchEventSource has always the latest token. So a refresh mechanism was required.
As a helpful aid, the fetch library allows passing a custom fetch.
So instead of static headers, I could pass my custom fetch method. This way I can construct the headers dynamically. And ensure the connection always uses the latest credentials.

My new code looks like this:

[...]
      await fetchEventSource(this.options.url, {
        method: 'GET',
        fetch: async (input, init) => {
          const freshToken = await this.options.getToken();
          const headers = new Headers(init?.headers);
          headers.set('Authorization', `Bearer ${freshToken}`);
          headers.set('Accept', 'text/event-stream');
          return globalThis.fetch(input, { ...init, headers });
        },
        signal: this.abortController.signal,
        onmessage: event => { [...] },
        onerror: error => { [...] },
      });
[...]

The wins

Investigating this was a pleasant and rewarding journey, one that got me more used to dive into the source code.
The change itself is small, but one that compounds: every user, every tab switch, every day.

Right now, there is less noise in our server logs, and less “fake” alerts getting triggered. So we have a better visibility, the real issues stand out instead of getting buried in the noise. We can focus on what matters.
Most importantly, this improved the user experience. When users switch back to the app, notifications arrive instantly. No more waiting 5-seconds reconnectDelay to kick in.

Loading related DB entities in EF Core

2026-01-30T00:00:00+00:00

For my latest client, I’ve been working with C# and ASP.NET Core, using Entity Framework (EF) Core as the ORM. This gave me the chance to explore how relationships between entities are modeled and and how EF Core loads related data.

Choosing the right data loading strategy directly impacts the database queries, resource consumption, response times, and even code clarity.
These are the main strategies:

Eager - related entities are loaded together with the parent ones
Explicit - related entities are loaded when you decide to load them
Lazy - related entities are loaded when you try and access them

I’ll illustrate the difference with practical code examples.

The project

I setup a Warehouse API with these entities:

namespace Warehouse.Api.Entities;

public class Customer
{
    public int Id { get; set;}
    public string Name { get; set; }
    public string Email { get; set; }
    private ICollection<Order>? _orders;
    public virtual ICollection<Order> Orders => _orders ??= [];
}

public class Order
{
    public int Id { get; set; }
    public int CustomerId { get; set; }
    public virtual Customer Customer { get; set; } = null!;    
    public virtual List<Item> Items { get; init; } = [];
    public virtual ICollection<OrderItem> OrderItems { get; init; } = [];
}

public class OrderItem
{
    public int Id { get; set; }
    public int OrderId { get; set; }
    public virtual Order Order{ get; set; } = null!;
    public int ItemId { get; set; }
    public virtual Item Item { get; set; } = null!;
    public int Quantity { get; set; } = 1;
}

public class Item
{
    public int Id { get; set; }
    public required string Name { get; set; }
    public required decimal Price { get; set; }
}

This API has a CustomerService that handles fetching the Customer with their data from the DB. It has a method for each loading type, implementing the interface:

namespace Warehouse.Api.Services;

using Entities;

public interface ICustomerService
{
    Task<List<Customer>> GetCustomersEagerAsync();
    Task<List<Customer>> GetCustomersExplicitAsync();
    Task<List<Customer>> GetCustomersLazyAsync();
}

I also added some extension methods to (pretty) print the records:

public static void Print(this Order order)
{
    Console.WriteLine($"    Order #{order.Id}");
}

The strategies

Eager Loading

This approach loads all related data in a single database query. Eager loading prevents additional round-trips to the database.
In EF Core, this is achieved with the Include() method to load the child entity, followed by ThenInclude() if loading any nested entities. Under the hood, EF Core translates this into SQL JOIN operations.

public async Task<List<Customer>> GetCustomersEagerAsync()
{
    Console.WriteLine("~~~ EAGER LOADING START ~~~");
    var customers = await _dbContext.Customers
        .Include(c => c.Orders)
            .ThenInclude(o => o.OrderItems)
                .ThenInclude(o => o.Item)
        .AsNoTracking()
        .AsSplitQuery()
        .ToListAsync();

    // All the data already loaded

    Console.WriteLine("~~~ EAGER LOADING END ~~~");
    return customers;
}

To optimize performance, I added the AsNoTracking() method. This pulls the data in a “read-only” mode and speeds up queries. It skips setting up the change tracker.

Note for further reference, when adding more same-level JOINs (multiple Include()), it is recommended to use the AsSplitQuery to avoid cartesian explosion.

Explicit Loading

This allows developers to decide exactly when to load data. Calling the Load method triggers the ORM to query for the navigational property. Explicit loading offers fine-grained control of when to load data based on runtime conditions.

The helper methods used for different scenarios:

Reference() - single navigation property
Collection() - collection

public async Task<List<Customer>> GetCustomersExplicitAsync()
{
    Console.WriteLine("~~~ EXPLICIT LOADING START ~~~");
    var customers = await _dbContext.Customers.ToListAsync();

    foreach (var customer in customers)
    {
        if (customer.Id <= 3)
        {
            // Explicitly load the orders
            await _dbContext.Entry(customer).Collection(o => o.Orders).LoadAsync();
        }
        customer.Print();

        foreach (var order in customer.Orders)
        {
            order.Print();

            if (order.Id <= 4)
            {
                // Also explicitly load the orderItems
                await _dbContext.Entry(order).Collection(o => o.OrderItems).LoadAsync();
            }

            foreach (var orderItem in order.OrderItems)
            {
                // Explicitly load the single navigation property - item
                await _dbContext.Entry(orderItem).Reference(o => o.Item).LoadAsync();
                orderItem.Print();
            }
        }
    }

    Console.WriteLine("~~~ EXPLICIT LOADING END ~~~");

    return customers;
}

Lazy Loading

The main entity is loaded first, and the related ones are only loaded when the navigation property is accessed.

The ORM makes use of proxies (dynamic classes) to trigger queries as they intercept any access to the navigation property. The proxies are not enabled by default, so for this, I installed the Microsoft.EntityFrameworkCore.Proxies NuGet package. Then, I included the option to enable lazy loading:

builder.Services.AddDbContext(options =>
{
    options.UseLazyLoadingProxies()
        .UseSqlite(connString);
});

I also had to change the entities from sealed record to class, and change some navigational properties by adding the virtual keyword.

public async Task<List<Customer>> GetCustomersLazyAsync()
{
    Console.WriteLine("~~~ LAZY LOADING START ~~~");
    var customers = await _dbContext.Customers
        .ToListAsync();

    foreach (var customer in customers)
    {
        customer.Print();
        foreach (var order in customer.Orders) // Orders accessed, loading Orders
        {
            order.Print();
            foreach (var items in order.Items) // Now loading OrderItems
            {
                items.Print();
            }
        }
    }

    Console.WriteLine("~~~ LAZY LOADING END ~~~");
    return customers;
}

This might be problematic, triggering the infamous N+1 problem. Here, we execute repeated database queries each time we access the navigation property, this can easily grow to a high number of queries to the database, increasing execution time and workload. Eventually, this might bring the database to a halt.
Let’s say we have this number of records together with the amount of queries we produce:

customers - 1 Customer query
orders    - 10 Order queries
items      - 100 Items queries

Total of 111 queries

So we will execute 111 database queries to fetch the ten customers with their orders and items.

Which one to use?

The decision to pick one over the other depends on various factors like the database used, the data access patterns, performance requirements, how much data to load upfront, etc.

Eager loading is great when you know upfront what related data you need. Or for when the related data is (almost) always accessed. It minimizes the queries, which makes the access more predictable. But it may fetch data that’s not being used, and it has a larger initial query.

Explicit loading is useful for conditional or data retrieval based on user actions (ex: pressing a “Show Details” button). It offers the most control over data loading.
But all that manual loading requires more complex/maybe ugly code. It is easier to get null references when you forget to load some data. And it is also susceptible to the N+1 issue.

Usually, you want to avoid Lazy Loading in favor of eager loading, especially if you always access the nested data. Lazy loading creates unnecessary data round-trips and makes performance issues harder to debug. It doesn’t work with AsNoTracking() and requires changing the entities (virtual) and proxy support. As the codebase grows and more data relations are added, it requires monitoring for N+1 issues.
That said, lazy loading has some strengths. It provides a faster initial load (main entity), keeps the code simpler, and it’s favored when related data is rarely accessed. It is also convenient for rapid development.
For embedded databases like SQLite, the cost of individual queries is much lower compared to client-server databases. Since SQLite doesn’t have the network overhead, the performance impact of lazy loading is reduced.

In large applications, the usual choice is a combination of these strategies. Each scenario has different access patterns and constraints. So picking the strategy case by case is better than relying on one strategy alone.

Exploring the ANSI escape injection in Active Record logging [CVE-2025-55193]

2025-08-18T00:00:00+00:00

Last week, two security patches were added to Rails. One of them was meant to guard against the ANSI escape injection [CVE-2025-55193], a vulnerability affecting Active Record logging. I was curious what an attacker could achieve by exploiting this vulnerability. Here, I logged my findings and created a simple PoC.

The sink is here. This line prints to the console, and the id is a user-controlled parameter, so it should not be trusted.

                                                                         ⌄
raise(RecordNotFound.new("Couldn't find #{name} with '#{primary_key}'=#{id}", name, primary_key, id))
>>>                                                                      ^bad boiii

It is not exploitable under most circumstances, and the impact is reduced on most terminals. However, it may still increase the attack surface, particularly if there are misconfigurations.

The affected versions:

activerecord >= 8.0, < 8.0.2.1 (patched in 8.0.2.1)
activerecord >= 7.2, < 7.2.2.2 (patched in 7.2.2.2)
activerecord >= 0, < 7.1.5.2 (patched in 7.1.5.2)

Please upgrade to one of the latest Rails Versions 7.1.5.2, 7.2.2.2, or 8.0.2.1.

I wanted to see how it can be triggered, so for this, I set up a basic Rails app at the vulnerable version 7.1.0.

mkdir ansi-vulnerable
cd ansi-vulnerable
echo "source 'https://rubygems.org'" > Gemfile
echo "gem 'rails', '7.1.0'" >> Gemfile
bundle install

# Check the rails version
bin/rails -v

Then I created the Rails app, the DB, and a placeholder scaffold.

bin/rails new . --force --skip-bundle
bin/rails g scaffold book title:string
bin/rails db:create db:migrate

bin/rails s # localhost:3000

Locally, I am using the xterm-256color term. So the payload for this might differ based on your terminal.

The escape sequences

I’ll not get into too many details on these.
In 1967, the “C0” control character set was first defined (ISO 646).
But soon after, in the 70s, video terminals were the new cool thing. They could display colors/styles/formats, move the cursor around, modify previously written text, etc.
There was a need for standardization of the code performing these “magic” features. This was achieved by ECMA-48 (1976), ANSI X3.64 (1979), and ISO 6429 (1983).
The terms “ANSI escape sequences” and “ANSI control sequences” are often used interchangeably, but the control ones are actually a subset of the escape sequences.

Back to the Control Characters (Cc), there are two types:

C0 - the first 32 non-printable characters of the ASCII table (defined initially in ISO 646).
C1 - an additional 32 Ccs, both in 7-bit and 8-bit encodings. The 8-bit set is more straightforward and encodes each Cc in a single byte; it spans from 128 to 159 (decimal). The 7-bit systems cannot encode values over 128 in a single unit, so to represent them, it was decided to combine the ESC character with one character between decimal 64 and 95.

The format of a Control Sequence:

CSI Pn In F

CSI - Control Sequence Introducer (\x1b (ESC)/\x9b/\x5b)
Pn - Parameter bytes (optional, code points \x30 <> \x3f, of n length, separated by “;”)
In - Intermediate bytes (optional, code points \x20 <> \x2f, of n length)
F - Final bytes (a bit combination from \x40 <> \x7e)

Alternative notation

You might see the string \e[32m represented as:

printf '\x1b\x5b\x33\x32\x6d' - Hex
printf '\033\133\063\062\155' - Octal
printf '\u001b\u005b\u0033\u0032\u006d' - Unicode
printf '\27\91\51\50\109' - Decimal
printf '\e[32m' - ASCII

Note: the true Decimal notation would be plain numbers without the \ character.

The payload

I tested with this payload:

\x1b\x5b3;32;44m hello \x1b\x5b0m

Here:

\x1b and \x5b - CSI
3;32;44 - Pn
- 3 - italics
- 32 - color green
- 44 - background color blue
0 - resets the style
m - calling the function

PAYLOAD=$(printf '\x1b\x5b3;32;44m hello \x1b\x5b0m')

wget http://localhost:3000/books/$PAYLOAD

This triggers the RecordNotFound error of ActiveRecord which prints the requested ID to the console. Being vulnerable to ANSI escape injection, it prints the styling as well.

This here suffices in demonstrating the vulnerability.

I researched what other things an attacker might be able to do. These depend on the terminal:

\x1b\x5b20F hello - move the cursor to previous 20 lines
\x1b\x5b10M hello - delete 10 lines
\x1b]8;;http://example.com\e\\This is a link\e]8;;\e\\\n - print links in the victim’s terminal
\x1b]52;c;c2xlZXAgMQplY2hvICQod2hvYW1pKQ== - clipboard injection (injecting echo $(whoami))
\x1b[?1001h\x1b[?1002h\x1b[?1003h\x1b[?1004h\x1b[?1005h\x1b[?1006h\x1b[?1007h\x1b[?1015h\x1b[?10016h\ - print mouse tracking values in the terminal

In some rare cases, it might even open up the possibility for remote command execution.

The patch

To fix this, the Rails team added a call .inspect on the id before printing it to the console (commit 3beef20).

Resources

https://nicholas-morris.com/articles/ansi-codes - Great read
https://invisible-island.net/xterm/ctlseqs/ctlseqs.html - Documentation to all the sequences xterm supports
https://www.youtube.com/watch?v=opW_Q7jvSbc - Weaponizing Plain Text: ANSI Escape Sequences as a Forensic Nightmare

5 simple steps to a lean Docker image

2025-07-07T00:00:00+00:00

Docker is a tool I often use, both for developing personal projects and also during my Cybersec studies. Recently, I researched how Docker builds an image and discovered ways to limit the image size.

This came after I set up a separate, basic VPS for testing stuff that quickly ran out of storage. Instead of simply upgrading the VPS storage, I went for frugality and optimized my Docker images.

Here are some basic first steps you can take to limit the image size. I tried keeping these steps language agnostic, but I’ll use a Node app to exemplify the concepts.

The build command I used:

docker build --no-cache -t node-api:v0 .

After each step, you’ll see the image size and build time. Notice that the build time may vary based on your system, network connection, time of day, and how depressed your machine is.

Initial build

I’m starting from this Dockerfile:

FROM node:latest

WORKDIR /usr/src/node-api

COPY package*.json ./
RUN npm install --verbose

COPY . .

RUN npm run build

CMD ["npm", "start"]

Size:
node-api     v0        1.43GB
Time
Building 17.2s (12/12)

Steps

1. Ignore files

With .dockerignore. Placed in the root directory.
This speeds up the build and also prevents sensitive files from showing up in the final image.

Here are some more details on syntax.

Size:
node-api     v1        1.37GB
Time:
Building 15.3s (12/12)

2. Base Image

This has a major impact on the size of the final image.
The main frameworks offer different image tags to use. Go for the leaner images, as you can save on storage, but this may come with a caveat.

For Node, use alpine images instead of latest.
Alpine-based images are popular for their minimal size and smaller vulnerability count. They are not officially supported by the Node team though. See the list of unofficial Node builds.
Alpine project uses musl to implement the C standard library, whereas Debian’s Node.js tags (for instance bullseye or slim) rely on the glibc. For this reason, it might cause compatibility issues with dependencies that include native code.
However, alpine will suffice for most projects.

I downloaded some common Node image tags to compare them:

REPOSITORY               TAG               SIZE
node                     latest            1.13GB
node                     bookworm          1.13GB
node                     bullseye          1.03GB
node                     slim              230MB
node                     alpine            165MB

Additionally, you can use distroless base images. They contain only your app with its runtime dependencies. The package managers, shells, and others are skipped. Using them dramatically decreases the image size and its attack surface.
This approach is more advanced and out of the scope of this article.

Size:
node-api     v2        408MB
Time:
Building 17.1s (12/12)

3. Multi-stage build

This allows you to separate the build and runtime envs. You can include only the essential files in the final image.

A Dockerfile accepts multiple FROM statements. Each FROM instruction begins a new stage of the build (and can use a different base image). And each stage can be named with the AS keyword.

Here is my updated Dockerfile:

# Build stage
FROM node:alpine AS build
WORKDIR /usr/src/node-api
COPY package*.json ./
RUN npm install --verbose
COPY . .
RUN npm run build

# Prod stage
FROM node:alpine
WORKDIR /usr/src/node-api
COPY --from=build /usr/src/node-api/build ./build
COPY package*.json ./
RUN npm install --verbose

CMD ["npm", "start"]

You can stop the build at a specific stage using the --target flag:

docker build --target build -t node-api:v3 .

Size:
node-api     v3        234MB
Time:
Building 8.6s (15/15)

4. Skip dev dependencies

Besides multi-stage, you can further skip dev dependencies during install.

For Node, always use ci (clean install) instead of i. This command is more efficient and installs the exact versions based on package-lock.json. It throws an error and exits for any version mismatch. It also accepts an --omit flag to skip some dependencies.

RUN npm ci --omit=dev

Size:
node-api     v4        172MB
Time:
Building 6.3s (16/16)

5. Merge layers and cleanup between them

We can clean the temporary files that are created after a RUN instruction. It is common for package managers to install additional components and keep a local cache. We can save space by:

Instructing the package manager to install the minimum dependencies
Remove the cache after installation, or instruct the package manager to disable the cache altogether

For example, after installing Node dependencies, npm creates metadata files that take up space in the image. We can use these commands to remove them:

RUN npm cache clean --force
RUN rm -rf /tmp/* /var/cache/apk/*

For Debian/Ubuntu use --no-install-recomends. It keeps the cache at /var/lib/apt/lists:

RUN apt-get install -y --no-install-recomends
RUN apt-get clean
RUN rm -rf /var/lib/apt/lists/*

For Python (pip), we can specify the same with --no-cache-dir.

Limit the number of layers

Every instruction in a Dockerfile creates a new layer in the image. Docker utilizes an overlay-type file system, stacking these layers cumulatively.
In the above examples, even if we instruct to delete the files, they are not deleted and the image size will not decrease, so the disk space will not be returned.

We can merge the RUN commands to avoid this. If we do the cleanup before the RUN command is completed, the files we want deleted will not end up in the image:

RUN npm ci --omit=dev && \
    npm cache clean --force && \
    rm -rf /tmp/* /var/cache/apk/*

Size:
node-api     v5        170MB
Time:
Building 6.7s (16/16)

Summary

And my final Dockerfile:

# Build stage
FROM node:alpine AS build
WORKDIR /usr/src/node-api
COPY package*.json ./
RUN npm ci && \
    npm cache clean --force && \
    rm -rf /tmp/* /var/cache/apk/*i
COPY tsconfig.json ./
COPY src ./src
RUN npm run build

# Prod stage
FROM node:alpine
WORKDIR /usr/src/node-api
COPY --from=build /usr/src/node-api/build ./build
COPY package*.json ./
RUN npm ci --omit=dev && \
    npm cache clean --force && \
    rm -rf /tmp/* /var/cache/apk/*


CMD ["npm", "start"]

Measuring image sizes

You can get the image size by listing the images:

docker images node-api

For more advanced insight, a useful tool is dive:

dive node-api

It offers TUI for interactively exploring a Docker image. You can see each layer in detail, check for wasted space, and identify where you can further optimize. It breaks down each layer including which files were added their size.

Alternatively, I found out you can just create the image without running it. Then, you can export its contents and inspect them manually:

docker create node-api:v5
docker container list -a
docker export  > node-api.tar

Or:

docker export $(docker create node-api:v5) > node-api.tar

Safeguard against DoS in Rails helper

2025-04-28T00:00:00+00:00

One recent contribution to the Rails codebase caught my attention. It concerns the distance_of_time_in_words method. The fix is meant to prevent a possible Denial of Service while using this method.

The contribution was brought by Stazer. I found out about the PR in the newsletter This week in Rails.

The problem

The distance_of_time_in_words method returns the approximate distance in time between two timeframes (can be Time, Date, or DateTime objects or integers) and displays it in a nice, humanized format. To be correct, the leap years between those two timeframes should be considered. It uses count and a range to get the number of leap years.

[...]
leap_years = (from_year > to_year) ? 0 : (from_year..to_year).count { |x| Date.leap?(x) }
[...]

This is a blocking process. The calculation can take a long time if the distance between from_year and to_year is big enough.
Users might be able to trigger this DoS if they can set a timestamp which is then being passed to distance_of_time_in_words.

I found it interesting how subtle this vulnerability is. The contributor encountered this problem in one of their personal projects and decided to open a PR to Rails.

The fix

This contribution safeguards against DoS. It calculates the leap years in constant time.

fyear = from_year - 1
(to_year / 4 - to_year / 100 + to_year / 400) - (fyear / 4 - fyear / 100 + fyear / 400)

I will present how you can test this fix locally.

Testing this fix

For this, I created a new, minimal Rails app:

rails new my_awesome_app --minimal

Then I wanted to override the distance_of_time_in_words method. So I created this new file:

# config/initializer/actionview.rb

require 'action_view'

module ActionView::Helpers::DateHelper
  alias __distance_of_time_in_words distance_of_time_in_words
  private :__distance_of_time_in_words

  def distance_of_time_in_words(_from_time, _to_time = 0, _options = {})
    [...]
    leap_years = if from_year > to_year
      0
    else
      fyear = from_year - 1
      (to_year / 4 - to_year / 100 + to_year / 400) - (fyear / 4 - fyear / 100 + fyear / 400)
    end
    [...]
  end

  def old_distance_of_time_in_words(_from_time, _to_time = 0, _options = {})
    [...]
    leap_years = (from_year > to_year) ? 0 : (from_year..to_year).count { |x| Date.leap?(x) }
    [...]
  end
end

I replaced the rest of the code in the method from the rails repo.

I’m now able to test the fix straight in the Rails console:

require "benchmark"

num_years = 100_000_000.years
Benchmark.bm do |x|
  x.report("old") { 
    ApplicationController.helpers.old_distance_of_time_in_words(Time.now, Time.now + num_years)
  }
  x.report("new") {
    ApplicationController.helpers.distance_of_time_in_words(Time.now, Time.now + num_years)
  }
end

         user     system      total        real
old  6.095959   0.000000   6.095959 (  6.096444)
new  0.000117   0.000000   0.000117 (  0.000117)
[...]

Here we can see the big difference. The old code counted the leap years in a way that slowed things down, here, taking around 6 seconds to perform the count.
As the number of years between the two dates increases, the computing time grows exponentially. When I tested it with a range of 1,000,000,000 years, it took 61 seconds. This has the potential to bring the application to a halt.
The updated code performs the calculation in constant time, regardless of the numbers of years.

DNS lookup from scratch

2025-02-26T00:00:00+00:00

My findings after implementing the DNS query without any library. This domain name system is nicely tucked away in the network drawers, so you don’t even notice it. Nonetheless, it is used by everyone on the internet multiple times a day.

Also called the “phone book of the internet”, DNS helps translate from human-readable hostnames (example.com) to computer-friendly IP addresses (23.192.228.80).

While learning, I put together a toy project, rbdig, written in Ruby, as I’m more comfortable with this language. Due to refactoring, the project’s code might not exactly match the code snippets presented here.

The steps I’ll describe:

Building the DNS request
Creating a socket and sending the DNS request
Receiving and parsing the DNS reply
Handling the recursive queries myself

I guided myself using this official document, RFC1035, to construct the DNS request and parse the response.

Step 1: Building the DNS request

The DNS request has two parts:

header (12 bytes)
question (variable length)

I wanted to how this is done by other tools. With the help of Netcat, I captured a DNS lookup from dig. I also used Wireshark to view the UDP packet as it does a good job of representing network packets.

# Start a listener on port 2020 (saving the output to a file)
nc -u -l 2020 > dns_lookup.txt

# Send a dig request to that port
dig +retry=0 -p 2020 @127.0.0.1 +noedns example.com

# Tip: you can use nc to forward the request to a DNS server (ex: Cloudflare's)
nc -u 1.1.1.1 53 < dns_lookup.txt > resp_dns_lookup.txt

This is the whole request sent by dig (as hex bytes):

840f01200001000000000000076578616d706c6503636f6d0000010001

But what does it all mean?

The first 12 bytes are the header: 840f01200001000000000000. I spread the method handling so that it includes comments for each component.
The header is described in section 4.1.1. You can find further info on what each means.

def query_header
  query_id = "\x84\x0f" # 2 random bytes. When we get the response, the same bytes should be included
  flags = "\x01\x00"    # the standard flag
  qd_count = "\x00\x01" # the # of entries in the question section
  an_count = "\x00\x00" # the # of resource records in the answer session
  ns_count = "\x00\x00" # the # of name server resource records (in authority records section)
  ar_count = "\x00\x00" # the # of resource records
  query_id + flags + qd_count + an_count + ns_count + ar_count
end

Now that the header is handled, I could move to the next section.
The question section is made of:

question name - the actual domain name we’re looking for
query type - the type of record we’re looking for (ex: “A” for IPv4 record)
query class - the class of record we’re looking for (ex: “IN” for the INternet)

The more complex part was building the question name. DNS has a format for encoding domain names. It follows a sequence of labels. Each label is made of a length octet + that number of octets. The domain name is terminated with a null label \x00.

A domain name as www.example.com becomes 3www7example3com0. My code for this:

def encode_domain(domain)
  enc = domain.strip.split('.').map { |s| [s.length].pack("C") + s }.join
  enc + "\x00"
end

Here is a screenshot of the request in Wireshark. If you want to reproduce this, set Wireshark to listen to the loopback interface and filter for the right udp.port.

I put all the encoding logic in the DNSQuery class.

Step 2: Creating a socket and sending the DNS request

I’ll not get into details here. The idea is to get this request out and listen to a response from the DNS server.
I created a UDP socket for this and wrapped everything in the connect method.

def connect(message, server = '1.1.1.1', port = 53)
  socket = UDPSocket.new
  socket.send(message, 0, server, port)
  response, _ = socket.recvfrom(512) # RFC1035 specifies a 512 octets size limit for UDP messages

  socket.close
  response
end

Step 3: Receiving and parsing the DNS reply

The DNS server will send back the response, which might or might not include the answer (the IPv4 address in our case). After receiving the response, I validated it has the same query_id as the request, and I was starting to parse it. Basically, I reversed the steps I used when building the request, and parsing the header and question sections. But, in addition, the DNS response might include 3 more sections, each including zero or more Resource Records (RRs):

Answers - the answer we’re looking for
Authorities (NS records) - when a nameserver doesn’t have the answer, it will redirect you to other servers
Additional - also when a nameserver doesn’t have the answer, but it includes the IPv4 address of those servers that might have the answer. This section could contain other data, but that’s out of the scope of this article. These Resource Records (RRs) all have the same format.

Here is a visual of how the DNS response is structured (source: RFC1035):

+---------------------+
|        Header       |
+---------------------+
|       Question      |     # the question for the name server
+---------------------+
|        Answer       |     # RRs with the answer
+---------------------+
|      Authority      |     # RRs pointing toward authority servers
+---------------------+
|      Additional     |     # RRs holding additional information
+---------------------+

A word on `Reader`

The DNS response will be a string of bytes, I needed to go over it while parsing. To keep track of where I was in the string, I created the Reader class. This gets initialized with a string. It can read a specific number of bytes from that string while keeping a pointer of the position I’m in the string.
A brief example:

r = Reader.new("\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09\x0A".b)
r.pos     # => 0
r.read(2) # => "\x00\x01"
r.pos    # => 2

Ruby has the StringIO class, which does this and more. But for this project, I wanted to implement the functionality I needed.

The parsing class

I created the DNSResponse class responsible for handling the response. It accepts the raw response and initiates an instance of Reader with that bytes string:

class DNSResponse
  attr_reader :header, :body, :answers, :authorities, :additional
  def initialize(dns_reply)
    @buffer = Reader.new(dns_reply.b)
    @header = parse_header
    @body = parse_body
    @answers = parse_resource_records(@header[:an_count])
    @authorities = parse_resource_records(@header[:ns_count])
    @additional = parse_resource_records(@header[:ar_count])
  end

  def parse_header
    query_id, flags, qd_count, an_count, ns_count, ar_count = @buffer.read(12).unpack('n6')
    { query_id:, flags:, qd_count:, an_count:, ns_count:, ar_count: }
  end

  def parse_body
    question = extract_domain_name(@buffer)
    q_type = @buffer.read(2).unpack('n').first
    q_class = @buffer.read(2).unpack('n').first
    { question:, q_type:, q_class: }
  end
  [...]

Extracting the domain name was maybe the most complex part. Up until now, it is straightforward, I could transform from \x07example\x03com\x00 to example.com and that would suffice.
However, I encountered some exceptions while I progressed to parsing the RR sections. Here is the method which does the parsing. It is neat that all RRs I care about for now have the same format.

# class DNSResponse
def parse_resource_records(num_records)
  # It returns an array of records if any
  num_records.times.collect do
    rr_name = extract_domain_name(@buffer)              # A domain name to which this RR belongs
    rr_type, rr_class = @buffer.read(4).unpack('n2')    # The type & class of this record
    ttl = @buffer.read(4).unpack('N').first             # Time-to-live for this record (how long it should be cached)
    rr_data_length = @buffer.read(2).unpack('n').first  # The length (bytes) of the rr_data field
    # Data describing the resource, variable length depending on the type of resource.
    # Ex: for TYPE='A' and CLASS='IN', the data = IPv4 address (4 bytes length)
    rr_data = extract_record_data(@buffer, rr_type, rr_data_length)
    { rr_name:, rr_type:, rr_class:, ttl:, rr_data_length:, rr_data: }
  end
end

# Sample RR
# {:rr_name=>"com", :rr_type=>2, :rr_class=>1, :ttl=>172800, :rr_data_length=>20, :rr_data=>"a.gtld-servers.net"}

Handling DNS compression and preventing loops

When the server encodes the DNS message, there might be repeated domain names. In order to keep the message size to a minimum, the domain system uses a compression scheme. If a certain value appeared beforehand in the message, instead of repeating the same name, it places a pointer to a previous occurrence of the same name. How does this look in practice?
If we search for example.com, the server might not have the answer, so it directs you to various .com TLD servers. It lists NS records, so, when it encodes the rr_name field, instead of repeating com, it points you to the question section which has the com value.

How does the pointer… points?
A domain label can have a maximum length of 63 character, or 00111111. Notice those two leading zeros? They can be used to differentiate a label from a pointer. The octet that points will have the first two bits set to one 11000000 (which is \xc0 in hex, 192 in decimal). The byte values starting with 01 & 10 are reserved for future use.
Then, it indicates the offset, the position where we can find the label. This is the remaining 14 bits.

+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| 1  1|                OFFSET                   |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

Here is a reply from a DNS root server answering to example.com. The first pointer in the response is highlighted (\xc0\x14). We start by reading this and notice the first byte is (\xc0) indicating a pointer. We would read the rest of the byte and sum it to the second byte to see where it points to, \x14 is 20 in decimal. So we would need to go back to position 20 and read the label from there. The label is com in this example.

Notice also the other highlighted pointers. This shows how DNS compression saved message estate by preventing repetition.
Section 4.1.4 of the RFC1035 describes the DNS compression.

Here is the code for extracting the domain name and handling DNS compression:

# class DNSResponse
def extract_domain_name(buffer)
  domain_labels = []
  loop do
    read_length = buffer.read(1).bytes.first
    break if read_length == 0
    if read_length == 0b11000000
      # Byte is pointer (DNS compression)
      pointing_to = buffer.read(1).bytes.first
      current_pos = buffer.pos
      buffer.pos = pointing_to
      domain_labels << extract_domain_name(buffer)
      buffer.pos = current_pos
      break
    else
      # Normal case, read the label as it is
      domain_labels << buffer.read(read_length)
    end
  end
  domain_labels.join(".")
end

Preventing an infinite loop

When RFC1035 was created, it didn’t warn about any harmful implementations of DNS compression. If we blindly follow the pointer without validating its value, we expose ourselves to memory corruption bugs and buffer overruns. This open the gates to possible DoS and even RCE attacks.

For example, if the pointer is set to \xff\xff, the offset value will be 16383, way out of the bounds of a DNS packet. The same is for decoding the domain name, we should make sure the length label’s value is no more than 63, so we prevent reading from other parts of memory.

Or if a pointer will offset to the current position minus one, to the pointer itself, that is, it will result in an infinite loop.

Here is the method, updated for handling these edge cases.

A simple query

Up until this point, I could do this basic query. Notice we’re asking Cloudflare’s DNS resolver, which will do all the work, sending subsequent queries to find the domain address (if not already cached).

domain = "example.com"
dns_resolver = "1.1.1.1"
query_id = "\x00\x01"

msg = DNSQuery.new(query_id).query_message(domain)
socket_response = connect(msg, dns_resolver)
raise "Invalid response: query ID mismatch." if socket_response[0..1] != query_id

dns_response = DNSResponse.new(socket_response).parse
if dns_response.answers.any?
  puts dns_response.answers.first[:rr_data]
else
  puts "Answer not found for #{domain}."
end
# => 23.215.0.138

4. No answer on the first try? (looping and querying NS servers)

I wanted to see the whole DNS process, and until now, my request flags have the Recursive Desired (RD) bit set to one. This means, I rely on the DNS server to handle any further queries until it finds the answer (if it supports RD). The conversation will be:

me: Can you tell me the IP address for "example.com"?
DNS server: I don't have it, but I'll ask other servers and come back with an answer.

If setting RD to zero, the discussion will be:

me: Can you tell me the IP address for "example.com"?
DNS server: I don't have it, but here is a list of servers who might know.

The new flag will then be \x00\x00, and I’ll also switch to querying one of the root servers (ex: l.root-servers.net at 199.7.83.42).

This new modification means I need to send more queries if the first one doesn’t return an answer. I’ll use a loop and always check the answers section of the DNS response. If no answers, the DNS server will hopefully return a list of records (in the additional section) with their own IP addresses. I will use it to query the next servers, which are likely to have the answer.
In some cases, the response has no additional records, but instead, the authorities section contains a list of authoritative nameservers. They are presented with their domain names instead of the IP address, which requires me to find out their own IP address before querying them.

def lookup(domain)
  nameserver =  '199.7.83.42' # l.root-servers.net
  max_lookups = 10

  max_lookups.times do
    puts "Querying #{nameserver} for #{domain}"
    query_id = [rand(65_535)].pack('n')
    msg = DNSQuery.new(query_id).query_message(domain)
    socket_response = connect(msg, nameserver)
    raise "Invalid response: query ID mismatch." if socket_response[0..1] != query_id

    dns_response = DNSResponse.new(socket_response).parse

    if dns_response.answers.any?
      # The query found an answer
      return dns_response.answers[0][:rr_data]
    end

    if dns_response.additional.any?
      # No answer, try querying these additional resources
      nameserver = dns_response.additional[0][:rr_data]
      next
    end

    if dns_response.authorities.any?
      # No answer, but here is the authority servers that might know the answer
      ns_name = dns_response.authorities[0][:rr_data] # An example: a.iana-servers.net
      # Lookup authority server's IP address
      nameserver = lookup(ns_name)
      next
    end
  end
  raise "Max lookups reached."
end

result = lookup('example.com')
puts "\nAnswer: #{result}"

Querying 199.7.83.42 for example.com
Querying 192.5.6.30 for example.com
Querying 199.7.83.42 for a.iana-servers.net
Querying 192.5.6.30 for a.iana-servers.net
Querying 199.43.135.53 for a.iana-servers.net
Querying 199.43.135.53 for example.com

Answer: 23.192.228.80

In the latest implementation of my project, the same can be achieved with the command:

./dig.rb -t -s example.com

This was the implementation of a recursive, and then iterative DNS query, from scratch. There are infinite improvements and features that can be added this project, like:

support querying other record types, and other record classes
DNSSEC
ability to resolve a list of domain names, instead of a single domain
support for reverse DNS lookups
etc.

Just some features I might add in the future.

2024 annual review

2025-01-06T00:00:00+00:00

I’m continuing the practice of reflecting on the year that passed.

Looking back at 2024, I will answer these questions:

What went well this year?
What didn’t go so well this year?
What am I working towards?

1. What went well this year?

My travel in South America

This trip has been my plan for a very long time, and I’m so happy to finally be able to take it.
It’s been ten months of wandering on this beautiful continent. During this time, I met many lovely people, made some strong connections, and even started building a relationship with a special person. My Spanish has improved here, and I now feel comfortable engaging in more advanced conversations.
It’s hard to capture my whole journey here.
I volunteered at a hostel in Montevideo for around five weeks, then at a farm outside Buenos Aires, and finally a month at a dog shelter in Peru.

I’ve spent quite some time in Uruguay. There is a wonderful tech community here, and I was pleasantly surprised by how the size of it. Ruby/Rails meetups are regularly held here, I joined a couple of them, one even had around 50-60 participants. Everybody is so welcoming and approachable, they even applauded me for being from Romania. They are definitely doing an excellent job bringing the community together.
The infosec community is not lacking here also, there are almost monthly OWASP meetups, culminating with the December meetup by OWASP Rio de la Plata. There were around 600 participants and lots of awesome speakers. The talks were in Spanish, but I was able to understand them (ok, maybe around 80% of what was going on).

As I traveled almost this entire year, the socializing opportunities were all around. I was looking for hostel stays. Although intense at times, being in this highly social environment is beneficial for building social skills.
All this helped me become comfortable with new spaces, and approaching people I don’t know yet. It even builds resilience by adapting to the unknown and to diverse perspectives.

The ability to meet people, form connections, and exchange ideas is so important. Even more essential in the era of AI where the threat of replacement looms over everyone. That’s why I’m stressing fine-tuning these abilities.

My first contribution to an open-source project

Another thing I had under my radar for a long time. Contributing to open-source projects always felt intimidating. This past year I committed to this, got over my insecurities, and made it happen.
My very first contribution went to the Casa project from Ruby for Good organization. This is a group of tech professionals who create software solutions for social good.

The maintainers were patient and communicated well, resulting in a positive experience.

Soon after, I pushed other contributions toward Casa and another Ruby for Good project, Human Essentials.

In the future, I intend to contribute more towards these positive projects. I recommend this to everybody thinking about contributing, just take the leap. It is an awesome way to shape your skills, meet others in this field, and learn from people more advanced than you.

2. What didn’t go so well this year?

This was a truly positive year, and I have no regrets. It took me a while to think about what would fall under this header.
But some things can be mentioned.

Not having a routine to learn cybersec

Traveling and changing places so often is not conducive to following a routine while learning. I didn’t enjoy as much time as I wanted to sit down and learn more about this field. I focused on exploring my surroundings, meeting people, and improving my social skills. This made the most sense as I was fortunate enough to be in this travel and wanted to take advantage of it.

Nevertheless, I didn’t fully stop learning. I spent most of the time on the Hack The Box platform which offers some of the best learning resources and labs I found.
I finished their “Information Security Foundations” path and almost all “SOC Analyst” job role path, plus other misc resources. This encompasses around 22 completed modules.

Not being consistent with sport/physical exercise

This is again related to the absence of a routine. Besides some hiking in nature and exploring the cities on foot, I haven’t practiced physical exercise as often as I wanted.
This lack of exercise catches on me and I feel I’m becoming more sluggish. That is not just physical, but mental as well. Even my creativity gets affected after a while.
I realized that I don’t like indoor gyms, except for bouldering walls, and running in a city with so many cars isn’t enjoyable for me.

3. What am I working towards?

Continue my cybersec journey (getting certified)

I’m committed to this transition and there are so many new things to learn. This time of travel allowed me to explore this field better. Playing around with different aspects of cybersecurity, I’m now leaning more on the Blue teaming/defensive security. This is generally considered a more boring side of the cybersec, but for some reason, it caught on me. I don’t find it boring at all. Discovering ways to harden a system, analyzing malware, or searching the logs for post-attack artifacts all sounds fascinating. And even more, wondering about ways to integrate AI into this.
Blue teaming is in itself a grand system, and I’m still figuring out my place in it. Inspired by some people in the cybersec space, I started documenting my journey in this field.

Connecting with professionals in the cybersec space

Simply gathering knowledge is not enough, meeting colleagues is an essential part. I realized I don’t personally know many people in this field, maybe there are two I can think of. When I switched to programming, I was in the same spot, but eventually, I got to build a good network of mentors and peers.
These next years, I’m looking to do the same in the cybersecurity field.

More contributing to open-source projects

After my initial contribution, I recognized its learning potential. You get access to a legacy project with more advanced issues. These complex issues will challenge you when finding ways to solve them. It also improves communication and collaboration skills, if you’re stuck, you can get help from the team. Also, as the code and git history are public, you can see how others have solved more complex issues before.
In the future, I’m looking to contribute more to these positive projects, focussed more on those related to cybersecurity and ruby/rails.

Place an SSH honeypot

2024-10-28T00:00:00+00:00

After deploying my VPS and taking steps to secure it, I had the original SSH port (22) inactive. But it kept me curious about the default SSH activity going on there. How much brute forcing is happening on a publicly exposed server? I started experimenting with honeypots to find out more.

But first, let’s get the definition out of the way. In Cybersecurity, a honeypot is a decoy resource designed to look as a legitimate target. It is often deployed to distract attackers from the important resources on the network, and/or profiling potential threats.

Choosing a honeypot

There are plenty of honeypots available, each a for different purpose, deployment context, OS, and network systems.
For my needs, I wanted a low-interaction SSH honeypot, not too resource-intensive, compatible with Linux, and relatively easy to set up and understand.

After tinkering with some of them, I’m describing here the Basic SSH Honeypot created by Simon Bell.
I’ve forked and updated it to suit my needs, and you can find it here.

Prerequisites

Important: Using this honeypot setup is only meant to be tested on a vanilla installation of Ubuntu.

I highly recommend having a simple VPS exclusive for testing honeypots; unless you know what you’re doing, don’t play with this on your production server. Although tiny, there’s a chance honeypots have (undiscovered) vulnerabilities. Allowing attackers to escape the honeypot, so to say, and get into the server.
Also, a good idea is to create a dedicated, non-root user for running the honeypot.
Never run a honeypot with sudo privilege, in the case an attacker manages to break out of the honeypot, it will have sudo access to the server.

Ubuntu 24.04.1 or similar
Docker installed (can be installed following these instructions)
A non-root user handling the docker container. Follow the steps here
git
ufw
Optional, but recommended, running the docker in rootless mode

Set up the SSH honeypot

First, set a firewall rule to redirect SSH requests from port 22 to 2222 (a non-privileged port).

sudo iptables -t nat -A PREROUTING -p tcp --dport 22 -j REDIRECT --to-port 2222

# And, if your firewall is enabled, allow connections the port 2222
sudo ufw allow 2222/tcp

From now, you should not use the sudo command anymore.
Clone the repository from above:

git clone https://github.com/panacotar/basic_ssh_honeypot.git && cd basic_ssh_honeypot

Create the RSA key pair:

ssh-keygen -t rsa -f server.key 
# When asking for a password, just skip it (press enter)

# Rename the public key
mv server.key.pub server.pub

Build the Docker image (provided you added your user to the docker group as described in the prerequisites):

docker build --no-cache -t basic_sshpot .

Then run it:

docker run -d -v ${PWD}:/usr/src/app -p 2222:2222 basic_sshpot

Some parameters here:

-d (--detach) - runs the container in the background. It prints the new container’s ID and you’ll get the prompt back.
-v (--volume) - creates a bind mount, creating the ssh_honeypot.log file in the current directory.

The honeypot now listens to incoming SSH connections and logs them to the log file (ssh_honeypot.log).
Run ss -tulpn to check the open ports, you should see the honeypot running:

Netid         State          Recv-Q          Send-Q                   Local Address:Port                    Peer Address:Port         Process
[...]
tcp           LISTEN         0               4096                              [::]:2222                            [::]:*  

After running the honeypot for a while, you will find its logs in the current directory, ssh_honeypot.log. You can also view them live with the command:

tail -f ssh_honeypot.log

Stopping the dockerized honeypot

docker stop $(docker ps -a -q  --filter ancestor=basic_sshpot)

Other honeypots

Cowrie - a great alternative that is very simple to set up and use. They also provide helpful documentation.
OpenCanary - modular and decentralized honeypot daemon that runs several canary versions of services that alert when a service is (ab)used.
T-Pot - all-in-one honeypot appliance (can be resource intensive)
ssh_honeypot - a light alternative, it logs the IP address, username, and password

Benchmark your ruby code

2024-10-11T00:00:00+00:00

Ruby offers an easy way to benchmark the code. Here is some syntax for basic benchmarking.

This is done with the Benchmark module included in the Ruby standard library. You can run it even in IRB, simply require "benchmark".
In its simplest form, Benchmark.measure accepts a code block and outputs the time it takes to execute it.

require "benchmark"

puts Benchmark.measure { sleep(1) }

Returning:

  0.000061   0.000031   0.000092 (  1.001175)

The meaning of these stats (measured unit is second):

user CPU time   system CPU time   sum user + system CPU   times elapsed real time
  0.000061        0.000031          0.000092                (  1.001175)

A more advanced form is using Benchmark.bm, this allows us to compare the execution of different code blocks:

require "benchmark"

arr = (1..100_000).map { rand }
Benchmark.bm do |x|
  # Each x.report is a different test item to compare against
  x.report { arr.dup.sort }
  x.report { arr.dup.sort! }
end

   user     system      total        real
0.021121   0.001285   0.022406 (  0.022459)
0.018150   0.003547   0.021697 (  0.021704)

We can also label the reports x.report("sort").
Also, we can provide predefined methods in order to compare them.
Using Benchmark.bmbm will run the tests twice for a (supposedly) better reading.

require "benchmark"

arr = (1..100_000_000).map { rand }
def first_method(arr)
  arr.last
end

def second_method(arr)
  arr[-1]
end

Benchmark.bmbm do |x|
  x.report("first_method") { 100_000.times do; first_method(arr); end }
  x.report("second_method") { 100_000.times do; second_method(arr); end }
end

Rehearsal -------------------------------------------------
first_method    0.004724   0.000000   0.004724 (  0.004793)
second_method   0.004317   0.000000   0.004317 (  0.004381)
---------------------------------------- total: 0.009041sec

                    user     system      total        real
first_method    0.005145   0.000000   0.005145 (  0.005263)
second_method   0.004220   0.000000   0.004220 (  0.004278)

Benchmark-ips

Another performance gem built on the Benchmark from above. Benchmark-ips measure how many times a code block will run in a second (iterations per second - IPS) rather than measuring the time it takes for a code block to run.

You have to install the gem: gem install benchmark-ips. The syntax is:

require "benchmark/ips"

arr = (1..100_000_000).map { rand }
def first_method(arr)
  arr.last
end

def second_method(arr)
  arr[-1]
end

Benchmark.ips do |x|
  x.report("first method") { first_method(arr) }
  x.report("second method") { second_method(arr) }

  x.compare!
end

ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
Warming up --------------------------------------
        first method     1.685M i/100ms
       second method     1.921M i/100ms
Calculating -------------------------------------
        first method     16.390M (± 3.2%) i/s   (61.01 ns/i) -     82.542M in   5.042099s
       second method     18.257M (± 1.1%) i/s   (54.77 ns/i) -     92.216M in   5.051680s

Comparison:
       second method: 18256810.3 i/s
        first method: 16389785.2 i/s - 1.11x  slower

Fix the N+1 queries in Rails

2024-09-02T00:00:00+00:00

The N+1 query problem is a common performance issue encountered in Rails applications.

There are tools to detect this problem automatically. And Active Record provides ways to fix it.

The problem

This is related to having associations and the way we load the respective records. Active Record simplifies database interaction, but it can lead to issues like N+1 query. By default, Active Record uses a lazy loading approach, meaning it only loads records when they are accessed.

N+1 query issue occur when the application queries the database, loops over the results, and executes a separate query for each associated record in the list.

An example association: Users having many Dogs.

class User < ApplicationRecord
  has_many :dogs
end

class Dog < ApplicationRecord
  belongs_to :user
end

It is common in the Rails app to load all records and then loop over them, accessing their associated model (for instance, wanting to display the records in an index view).

# rails c
User.all.each { |u| puts u.dogs };nil

Here, we list the user’s dogs. While the code works correctly, it triggers too many database queries. Specifically, it prompts Active Record to execute one query to fetch the users and additional queries for each user in the database (a total of 1+N queries):

User Load (0.1ms)  SELECT "users".* FROM "users"
Dog Load (0.1ms)  SELECT "dogs".* FROM "dogs" WHERE "dogs"."user_id" = ?  [["user_id", 1]]
Dog Load (0.1ms)  SELECT "dogs".* FROM "dogs" WHERE "dogs"."user_id" = ?  [["user_id", 2]]
[...]
Dog Load (0.0ms)  SELECT "dogs".* FROM "dogs" WHERE "dogs"."user_id" = ?  [["user_id", 300]]

This can slow down the app and result in a high database load, especially in apps with large datasets. For an app with 100.000 associated records, there will be 1 + 100.000 queries.

The Active Record solution

One solution provided by Active Record is to eager load the associated records upfront. This is achieved with the #includes query method, allowing the app to load users and all their dogs in two queries. It avoids the N+1 query problem.

# rails c
User.includes(:dogs).all.each { |u| puts u.dogs };nil

User Load (0.1ms)  SELECT "users".* FROM "users"
Dog Load (0.4ms)  SELECT "dogs".* FROM "dogs" WHERE "dogs"."user_id" IN (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)  [["user_id", 1], ["user_id", 2], ["user_id", 3], ["user_id", 4], ["user_id", 5], ["user_id", 6], ["user_id", 7], ["user_id", 8], ["user_id", 9], ["user_id", 10]]

The case of nested associations

What happens if we need to access data from a nested association?
Let’s say each dog has_many toys, and we want to print the number of toys for each dog.
If we simply add a call dog.toys.size, the records for dogs and users will be eager loaded, while the toys will still be lazy loaded.

# rails c
User.includes(:dogs).all.each { |u| u.dogs.each { |d| puts d.toys.size } };nil

User Load (0.1ms)  [...]
Dog Load (0.4ms)  [...]
Toy Count (0.1ms)  SELECT COUNT(*) FROM "toys" WHERE "toys"."dog_id" = ?  [["dog_id", 1]]
Toy Count (0.1ms)  [...] 
Toy Count (0.0ms)  [...]
[...]
Toy Count (0.0ms)  [...]

The syntax for including the association is User.includes(dogs: :toys) or User.includes(dogs: [:toys]):

# rails c
User.includes(dogs: :toys).all.each { |u| u.dogs.each { |d| puts d.toys.size } };nil

User Load (0.2ms)  [...]
Dog Load (0.4ms)   [...]
Toy Load (0.5ms)   [...]

Useful tools

The Bullet gem can be implemented in your app. It automatically checks your app and notifies you when it detects N+1 queries. Moreover, it also notifies when you’re using eager loading that isn’t necessary and when you should use counter cache. Make sure to add it under the development gems.
Once Bullet detects an N+1 query issue, it will trigger a warning:

user: john
GET /
USE eager loading detected
  User => [:dogs]
  Add to your query: .includes([:dogs])
Call stack
[...]

Resources

ruby.mobidev.biz
www.visuality.pl

Security and coding tutorials by Darius Pirvulescu

SSE, Clerk, and rotating tokens: A debugging story

The SSE workflow

The problem

The symptoms

The code

The bug

The fix

The wins

Loading related DB entities in EF Core

The project

The strategies

Eager Loading

Explicit Loading

Lazy Loading

Which one to use?

Exploring the ANSI escape injection in Active Record logging [CVE-2025-55193]

The escape sequences

Alternative notation

The payload

The patch

Resources

5 simple steps to a lean Docker image

Initial build

Steps

1. Ignore files

2. Base Image

3. Multi-stage build

4. Skip dev dependencies

5. Merge layers and cleanup between them

Limit the number of layers

Summary

Measuring image sizes

Safeguard against DoS in Rails helper

The problem

The fix

Testing this fix

DNS lookup from scratch

Step 1: Building the DNS request

Step 2: Creating a socket and sending the DNS request

Step 3: Receiving and parsing the DNS reply

A word on Reader

The parsing class

Handling DNS compression and preventing loops

Preventing an infinite loop

A simple query

4. No answer on the first try? (looping and querying NS servers)

2024 annual review

1. What went well this year?

My travel in South America

Letting go of social anxiety

My first contribution to an open-source project

2. What didn’t go so well this year?

Not having a routine to learn cybersec

Not being consistent with sport/physical exercise

3. What am I working towards?

Continue my cybersec journey (getting certified)

Connecting with professionals in the cybersec space

More contributing to open-source projects

Place an SSH honeypot

Choosing a honeypot

Prerequisites

Set up the SSH honeypot

Stopping the dockerized honeypot

Other honeypots

Benchmark your ruby code

Benchmark-ips

Fix the N+1 queries in Rails

The problem

The Active Record solution

The case of nested associations

Useful tools

Resources

A word on `Reader`