How can I crawl a url "protected" by a 'sign-in' page?

zanamich · February 9, 2025, 10:06am

I want to use GPT Crawler on a page which requires to be signed in to.
How can I do that?

builder_akash · February 11, 2025, 8:10am

Hi @zanamich ,

To use GPT Crawler on a page that requires sign-in, you’ll need to follow these steps:

Sign in to the Website: First, manually sign in to the website using your credentials in your browser.
Obtain Cookies or Session Tokens: After signing in, you’ll need to extract the session cookies or tokens that are used for authentication. You can do this using your browser’s developer tools:
- In Chrome, go to Developer Tools, navigate to the “Application” tab, and look for cookies under the “Storage”.
Use Authentication Cookies in GPT Crawler: Once you’ve gathered the cookies or session tokens, you’ll need to configure the GPT Crawler to use them in its requests. Depending on the crawler, this could involve setting the cookie header or providing the session token in the request headers.
Handle Session Expiry: Keep in mind that authentication sessions often expire. You’ll need to refresh the session by either:
- Automatically refreshing the token or cookies if the crawler supports it.
- Manually re-signing in and updating the session information in the crawler when necessary.

I hope that answers your question.

Regards,

zanamich · February 16, 2025, 10:57am

Thank you @builder_akash!
Is it possible to get a syntax example of cookie usage? Thank you

zanamich · February 22, 2025, 6:53pm

Here’s an example for everyone to use:

export const defaultConfig: Config = {
  url: "https://www.builder.io/c/docs/developers",
  match: "https://www.builder.io/c/docs/**",
  selector: `.docs-builder-container`,
  maxPagesToCrawl: 50,
  outputFileName: "output.json",
  cookie: [
      { name: "my_cookie", value: "my_coockie_value" },
      { name: "another_cookie", value: "another_value" }
  ]
};

Topic		Replies	Views
Creating API for GPT Crawler Technical Questions	4	157	May 10, 2024
Gpt-crawler "accordion" and pdf Technical Questions	1	198	March 6, 2024
Gatsby Custom Targeting Technical Questions	2	468	September 22, 2021
Can not Access To Private Page And Edit Content Visual Editor Help	8	738	January 2, 2025
Connecting API Data: Authentication Technical Questions	7	1998	April 17, 2023

How can I crawl a url "protected" by a 'sign-in' page?

Related topics