How can I crawl a url "protected" by a 'sign-in' page?

I want to use GPT Crawler on a page which requires to be signed in to.
How can I do that?

Hi @zanamich ,

To use GPT Crawler on a page that requires sign-in, you’ll need to follow these steps:

  1. Sign in to the Website: First, manually sign in to the website using your credentials in your browser.
  2. Obtain Cookies or Session Tokens: After signing in, you’ll need to extract the session cookies or tokens that are used for authentication. You can do this using your browser’s developer tools:
    • In Chrome, go to Developer Tools, navigate to the “Application” tab, and look for cookies under the “Storage”.
  3. Use Authentication Cookies in GPT Crawler: Once you’ve gathered the cookies or session tokens, you’ll need to configure the GPT Crawler to use them in its requests. Depending on the crawler, this could involve setting the cookie header or providing the session token in the request headers.
  4. Handle Session Expiry: Keep in mind that authentication sessions often expire. You’ll need to refresh the session by either:
    • Automatically refreshing the token or cookies if the crawler supports it.
    • Manually re-signing in and updating the session information in the crawler when necessary.

I hope that answers your question.

Regards,

Thank you @builder_akash!
Is it possible to get a syntax example of cookie usage? Thank you

Here’s an example for everyone to use:

export const defaultConfig: Config = {
  url: "https://www.builder.io/c/docs/developers",
  match: "https://www.builder.io/c/docs/**",
  selector: `.docs-builder-container`,
  maxPagesToCrawl: 50,
  outputFileName: "output.json",
  cookie: [
      { name: "my_cookie", value: "my_coockie_value" },
      { name: "another_cookie", value: "another_value" }
  ]
};