AI-Powered E2E Testing with Midscene.js and Playwright

E2E testing without writing selectors. In this post, we'll try using Midscene.js for controlling UI with Natural language commands together with Playwright, the popular framework for automated browser testing.

What is Midscene.js?

Midscene.js is a JavaScript SDK that uses LLM models (such as GPT-4o or Qwen) to interpret our commands, such as:

"Type art toy and search" "Extract all product names and prices"

And converts them into actual browser interactions without writing any DOM/selector code.

Integrate with Playwright

For those who want to see the full playwright config including the e2e code itself, you can check out the Example Project that midscenejs created https://github.com/web-infra-dev/midscene-example/blob/main/playwright-demo https://midscenejs.com/integrate-with-playwright.html

Here we'll try search chaka on popmart

./e2e/popmart-search.ts

import { expect } from "@playwright/test";
import { test } from "./fixture";

test.beforeEach(async ({ page, aiTap }) => {
  page.setViewportSize({ width: 1280, height: 768 });
  await page.goto("https://www.popmart.com/th");
  await page.waitForLoadState("load");
  await aiTap('click the "ยอมรับ" (accept) button for privacy policy at the bottom of the page');
});

test("search chaka on popmart", async ({
  ai,
  aiQuery,
  aiAssert,
  aiWaitFor,
  aiNumber,
  aiBoolean,
  aiString,
  aiLocate,
}) => {
  // 👀 type keywords, perform a search
  await ai('type "chaka" in search box located in the top appbar, hit Enter');

  // 👀 wait for the loading
  await aiWaitFor("there is at least one item item on page");

  // 👀 find the items
  const items = await aiQuery(
    "{itemTitle: string, price: Number}[], find item in list and price"
  );
  console.log("items", items);
  expect(items?.length).toBeGreaterThan(0);

});

Step by step

Go to https://www.popmart.com/th and wait until complete
We'll use aiTap to click the accept button in the cookie section

Another advantage of using midscene is that it reports each action visually

ai('type "chaka" in search box located in the top appbar, hit Enter') Tell the AI to find the search box at the top appbar, type chaka and press enter

If you try clicking on the actual website, you'll see it doesn't just click and type in that box but the UI shows a drawer-like element, and it can handle this too

4.await aiWaitFor("there is at least one item item on page") Wait until items appear

aiQuery we tell it to help query items and prices {itemTitle: string, price: Number}[], find item in list and price

Expect items to exist. In reality, we could also add a condition that the items must be related to the keyword we're searching for (in this case, Chaka)

But we should note that it's from a visualized perspective as an image, meaning it only sees what's shown on the screen.

items [
  { itemTitle: 'CHAKA Candle Whisper Series Figures', price: 380 },
  { itemTitle: 'CHAKA Candle Whisper Series Figures', price: 380 },
  {
    itemTitle: 'MEGA SPACE MOLLY 400% Sanrio Characters Series',
    price: 6290
  }
]

Report

After running, we can view the report for each action, which shows if it's an AI action, how many tokens were used, and how long it took.

Caching

I believe many people will ask "Won't this be expensive and slow if we have to call AI every time?"

Midscene supports caching the planning steps and DOM XPath to reduce calls to the AI model. After the first test run, it will create caching as .yaml

midscene_run/cache/popmart.spec.ts(search-chaka-on-popmart).cache

midsceneVersion: 0.17.0
cacheId: popmart.spec.ts(search-chaka-on-popmart)
caches:
  - type: locate
    prompt: >-
      click the "ยอมรับ" (accept) button for privacy policy at the bottom of the
      page
    xpaths:
      - //*[@id="__next"]/div[1]/div[1]/div[4]/div[1]/div[2]/text()
  - type: locate
    prompt: The search box in the top appbar
    xpaths:
      - >-
        //*[@id="__next"]/div[1]/div[1]/div[1]/div[1]/div[2]/div[2]/div[1]/div[1]/div[1]/div[1]/div[1]/div[1]/img[1]
  - type: plan
    prompt: type "chaka" in search box located in the top appbar, hit Enter
    yamlWorkflow: |
      tasks:
        - name: type "chaka" in search box located in the top appbar, hit Enter
          flow:
            - aiInput: chaka
              locate: The search box located in the top appbar
            - aiKeyboardPress: Enter
  - type: locate
    prompt: The search box located in the top appbar
    xpaths:
      - >-
        //*[@id="__next"]/div[1]/div[1]/div[1]/div[1]/div[2]/div[1]/div[1]/text()

This is an example after running the popmart search test. You'll see it has created planning and xpaths.

So the next time if we run with cache, it will run with these commands instead of calling AI.

Except for aiBoolean, aiQuery, aiAssert which won't be cached because they're queries, not actions.

AI Model

Since midscenejs is an open-source project separate from any cloud provider or model, we can choose to deploy either publicly or privately.

Advantages of Using AI for E2E Testing Compared to Traditional Testing

Understands UI without being tied to DOM selectors

If UI changes element type (e.g. <button> → <a>), traditional test scripts will break
But AI looks at UI context like text, color, position instead Or sometimes expects is it a circle? is it red?

Easy communication with non-dev team members

PM, Designer can read tests
Or write basic prompts themselves without knowing JavaScript
For example "open settings page and check if there's a toggle for dark mode"

Suitable for testing business logic / high-level scenarios
Easier to write tests for dynamic UI

For example "click confirm and there should be a success dialog at the top right corner"
Traditional tests might need many branches, but AI understands context better

However, we still shouldn't combine all AI actions into one command. We should still separate actions clearly because it's a Test and we don't want non-deterministic behavior each time when running in test systems https://midscenejs.com/blog-programming-practice-using-structured-api.html

Conclusion

In my view, using AI-based testing is like having an "assistant that understands UI" to help write tests — it doesn't replace everything and we need to understand this aspect too. But it helps solve problems in real user interaction flows. We can think of it from a visualized perspective, like when a user does this, this should happen on screen. Sometimes catching elements might be very difficult, but from an actual user's perspective, it might be very simple indeed.

AI-Powered E2E Testing with Midscene.js and Playwright

What is Midscene.js?

Integrate with Playwright

Report

Caching

AI Model

Advantages of Using AI for E2E Testing Compared to Traditional Testing

Conclusion

Related Articles

Data Extraction from Images Using Gemma3

GPU Monitoring Dashboard

Nvidia GPU Driver Setup: Essential Steps for AI Developers