Skip to content

resuming after stop() is broken #91

@klkvsk

Description

@klkvsk

So I'm trying to write a wrapper over this lib that would be used as a generator. Basically I want to yield all json-objects I need from a stream.

The lib does not use yields or generators, so this is what I came up with:

First, a simple listener for this example: lets collect all values from "foo" fields given they all are scalar

class FooJsonListener extends \JsonStreamingParser\Listener\IdleListener {
    protected $isReadingFoo = false;
    protected $onFooCallback;
    
    public function onFoo(callable $onFooCallback)
    {
        $this->onFooCallback = $onFooCallback;
    }

    public function key(string $key): void
    {
        $this->isReadingFoo = ($key === 'foo');
    }

    public function value($value): void
    {
        if ($this->isReadingFoo) {
            call_user_func($this->onFooCallback, $value);
            $this->isReadingFoo = false;
        }
    }
}

Now this function starts parser, then stops every time we collected a foo-value, yields it, then resumes parsing again:

function fooIterator($stream) {
    $jsonListener = new FooJsonListener();
    $jsonParser = new JsonStreamingParser\Parser($stream, $jsonListener);

    $lastFoo = null;
    $shouldYieldFoo = false;
    $jsonListener->onFoo(function ($foo) use (&$lastFoo, &$shouldYieldFoo, $jsonParser) {
        $lastFoo = $foo;
        $shouldYieldFoo = true;
        $jsonParser->stop();
    });

    while (true) {
        $jsonParser->parse();
        if ($shouldYieldFoo) {
            yield $lastFoo;
            $shouldYieldFoo = false;
        } else {
            break;
        }
    }
}

Looks awful, but should work. But it does not. After the first yield, next parse() will throw:

JsonStreamingParser\Exception\ParsingException: "Parsing error in [1:1]. Expected ',' or ']' while parsing array. Got: {"

It works if we don't interrupt it with "stop()" thought.

The problem:
parse() reads stream by chunks, but parses char-by-char. On any char there might be a call to listener to register parsed token. After every char it checks stopParsing flag, raised by stop().
So most of the cases, when you call stop() from listener you break chunk parsing in the middle, leftover of the chunk is discarded, and calling parse() then will proceed from the next chunk, not where it stopped.
And also stopParsing is never reset to false.

Possible fixes:
a. Store the actual count of bytes read on stop(), and do fseek to this offset on next parse() (would not work with non-seekable streams thought)
b. Better: store the leftover of chunk on stop() and prepend it to data read on next parse()

Sadly, neither of this is possible to implement by extending classes due to the privates.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions