Skip to content

Protocol-okhttp: implement IP filter #1107

@jnioche

Description

@jnioche

See NUTCH-2930

In order to avoid information leakage to a public search index or web archive, it should be possible to configure Nutch in a way that no content is fetched from localhost, loop-back addresses, private address spaces.

NUTCH-2527 adds the configuration snippets to exclude URLs pointing to private addresses.

However, filtering URLs isn't enough because a DNS entry of an arbitrary host name may point to a private IP address. Blocking must happen on the protocol level because the IP address is only know in the protocol implementation. I'll add an implementation for protocol-okhttp.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions