Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crawler html is empty when retrieved from a function or class method #418

Open
monkeyArms opened this issue Feb 5, 2021 · 2 comments · May be fixed by #425
Open

Crawler html is empty when retrieved from a function or class method #418

monkeyArms opened this issue Feb 5, 2021 · 2 comments · May be fixed by #425

Comments

@monkeyArms
Copy link

I ran into what I consider a 'bizarre' issue the other day:

When a Symfony\Component\Panther\Client instance is used within the same calling function/method and the Symfony\Component\Panther\DomCrawler\Crawler instance is retrieved, everything works.

However, if the Crawler is retrieved from another function/method, Crawler::html() provides an empty string.

Example class:

<?php

namespace App\Test;

use Symfony\Component\Panther\Client;
use Symfony\Component\Panther\DomCrawler\Crawler;

class PantherTest
{
	/**
	 * @var string
	 */
	protected $url;

	
	public function __construct()
	{
		$this->url = 'https://example.com/';
	}
	
	/**
	 * @return Crawler
	 */
	protected function fetchUrlGetCrawler(): Crawler
	{
		$client = Client::createChromeClient();

		$client->request( 'GET', $this->url );

		return $client->getCrawler();
	}

	public function test1()
	{
		$client = Client::createChromeClient();

		$client->request( 'GET', $this->url );

		$crawler = $client->getCrawler();

		dump( $crawler->html() );
	}

	public function test2()
	{
		$crawler = $this->fetchUrlGetCrawler();

		dump( $crawler->html() );
	}
}

The PantherTest::test1 method works as expected:

$test = new PantherTest();
$test->test1();

but the PantherTest::test2 method does not, even though the exact same code is duplicated inside another method:

$test = new PantherTest();
$test->test2();

I've tried this on both my local dev server, and a remote debian/apache server with the same results.

@dunglas
Copy link
Member

dunglas commented Feb 5, 2021

It's probably because in the second test, the destructor of Client is called. Maybe should we store a reference to the client in the crawler to prevent this bug.

@monkeyArms
Copy link
Author

Ah...that makes complete sense.

My use case was a method that would accept a URL argument and could return a Crawler instance regardless of how Crawler was created (via BrowserKit, Panther, or populated via a Symfony\Component\HttpClient\HttpClient response.

I ultimately decided to discard the 3rd (HttpClient) option, and return a Symfony\Component\BrowserKit\AbstractBrowser instance instead from the method, as the Client is sometimes needed anyway.

I don't know if the Crawler should know about the Client or not - I just know I spent more time than I preferred to tracking down where things went wrong in this instance. Perhaps at least the Client destructor could populate a Crawler flag that would cause an Exception to be thrown if the Client was no longer available with an explanation?

Either way, thank you for your response and the great library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants