#!perl -w

# Pop3proxy - a SpamAssassin enabled POP3 proxy designed for Win32
# users.

use strict;

# Set this to zero to turn off all debugging statements.  Set to 1 for
# basic debugging, which is pretty verbose, set it to 2 to add a dump
# of key data structs on connect, set it to 3 to add a dump of every
# read/write we do. (Oy)
use constant DEBUGGING => 1;

# Seems that SpamAssassin wants to remove the dependency on
# Time::HiRes.  I only need it for measuring performance, so I'll only
# include it if it's available.  Have to eval the "use constant"
# statements to avoid redefinition warnings.
#
# I use constants for debugging switches because I believe they get
# optimized out by the compiler if they're false.  I could be wrong.
BEGIN {
  eval "use Time::HiRes";
  if ($@) {
    eval "use constant TIMERS => 0";
  } else {
    eval "use constant TIMERS => 1";
  }
}

# A set of enumerated reasons why we're snarfing a multiline response
# for a socket.
use constant RETR => 1;
use constant TOP => 2;
use constant CAPA => 3;

use IO::Socket;
use IO::Select;

use FindBin;

use Mail::SpamAssassin;
use Getopt::Long;

#########################
# A BUNCH OF EVIL GLOBALS
#########################

# Set this to be a file that will contain the debug log.  Set to an
# empty string to debug to STDOUT.  --logfile command line arg sets.
my $logfile = 'pop3proxy.log';

# Hostmap - keys are ports to listen to on localhost, values are
# hostname:port to proxy connections on the key port to.  Set up by
# the command line --host arg or by the hostmap.txt config file.
#
# A simplest case - you get your mail from a server server named
# pophost.isp.com, on the standard POP3 port (110):
#
# my %hostmap = ( 110 => 'pophost.isp.com:110' );
#
# ...And you change your mail client to get mail from localhost.
#
# Fancier case - you pop mail off of two hosts, pophost.isp.com and
# mail.yetanother.org:
#
# my %hostmap = (
#    817 => 'pophost.isp.com:110',
#    818 => 'mail.yetanother.org:110',
# )
#
# In that case, the proxy listens to TWO sockets on localhost - 817,
# and 818, proxying off to two separate remote hosts as indicated.
#
# Note that for this to work, you need to be able to tell your mail
# client to connect to two different ports on localhost to find the
# proxy - namely, 817 for pophost.isp.com and 818 for
# mail.yetanother.org.  Some mail clients, like Netscape 4.5's, won't
# let you specify the port to use for a pop3 connection.  Oops.
# Others, like Mozilla 1.0, will let you set the port, but won't allow
# two servers to be on the same host (localhost in this case).  You
# can work around THAT by creating another alias for localhost in your
# C:\Windows\Hosts file:
#
# 127.0.0.1       localhost MyHostName
#
# ...and then configuring one account for localhost:817 and the other
# for MyHostName:818
my %hostmap = ();

# Respect_byte_count - If TRUE, then we do not alter the byte count of
# the message when marking it as spam - instead, we overwrite portions
# of the headers, such as changing the first five characters of the
# Subject: line to "*SPAM*" (a shortened form of SpamAssassin's famous
# subject prefix).  Set by the command line --nopad arguement.
#
# This, because under certain conditions the POP3 protocol indicates
# message and mailbox sizes, and the safe thing is not to enlarge
# those sizes while marking a message as spam.
#
# If there is no Subject: line in the mail headers (there doesn't have
# to be, after all) or if it's less than 5 bytes, then we use the
# first Received: line we find instead.
#
# Setting this value to FALSE (0) seems to work with most mail
# clients, and it causes us to proxy back the mail as it's been
# modified by SpamAssassin, which gives you a wonderful great lot of
# info about WHY it's labeled as spam, and also labels it clearly and
# beyond doubt, and defangs the MIME contents, etc, etc - but it
# *could* break the mail client.  Harumph.
my $respect_byte_count = 0;

# If true, we let the POP3 "TOP" command go thru to the server,
# otherwise, we don't proxy the TOP command and return an error back
# to the client.  Set by the command line --allowtop arguement.
#
# TOP is specified as an optional command, it shows you the headers of
# a mail message and a configurable number of lines of the body.  The
# idea is that you can sort of "screen" what you choose to download or
# not before you do.  All well and good, but our spam filtering can
# cause this to break when we scan the actual message during retrieval
# and potentially modify or add to the headers, such as changing the
# subject line to start with *****SPAM***** or something.
#
# This breaks the protocol a little and could have unusual or possibly
# even destructive consequences.  Since it's an optional part of the
# protocol, most mail clients should be coded to work without it,
# hence, by default, we avoid the problem by turning it off.
my $allow_top = 0;

# Here's the problem with using SpamAssassin in this way - given a
# large enough message, he will take a LONG time to scan it, where
# long is like sixteen minutes on a P-II 350 running Linux for a 3MB
# text message.  Maybe that was a degenerate case of some sort, but
# there it is.  If SpamAssassin takes long enough to scan a message,
# the mail client (who's not getting any data in response to his RETR
# command during all this) will eventually time out.  Sockets close,
# data is lost, etc, etc.  Very bad, very difficult to fix and get on
# with your life if you have a large mail message on the server that
# keeps causing this.
#
# Hence, this config parameter.  If a message exceeds this size while
# we're snarfing it, we'll abandon the snarf, start passing the data
# back to the client, and no scan of the message by SpamAssassin will
# be performed.
#
# Setting this to zero turns this behavior off - all messages will be
# scanned, regardless of size.
#
# I chose a 250K default for this value after analyzing a few months
# worth of spam - 1500 messages.  The average size was about 9K, the
# largest was 110K.  I figured double the largest would allow most of
# the spam we see today to get scanned, without trouble.
#
# This has the added side effect of keeping our memory usage down -
# that scan of a 3MB message took 86MB worth of memory.  That's not
# such a hot idea for a daemon.
my $max_scan_size = 250000;

# If we're invoked with a logfile for output using ActiveState's
# wperl.exe, we can effectively hum along in the background.  Nice.  I
# don't want to send the user to Task Manager to shut us down, and
# under Win98 at least you get the nasty "application not responding"
# dialog box because I'm busy waiting for to select a socket, so
# instead we have this - a port that we listen on for the purposes of
# exiting.  Any connection to it from localhost, and I'll get out of
# town.
#
# The default is 9625 (which is otherwise unused).  Set this to zero
# to disable this behavior.
my $exit_port = 9625;

# ( Not for vanity but to meet the Perl Artistic License)
# Johannes Endres (je@heise.de) added --listenalladdr and necessary code
# on 12/11/2002
#
# Listening on all interfaces may be a security problem.
# For example the german ISP T-Online doesn't use APOP. They use the
# dial-in user data for authentication. Get it? If you have a proxy
# running, anyone can get your E-Mail without the password.
#
# That's why by default we only listen on the loopback iterface.
# The argument --listenalladdr changes this to '0.0.0.0' telling us
# to listen on all interfaces.
#
# Todo: this could be a little more sophisticated with a list of
# interfaces to listen on.
my $LocalAddr = 'localhost';

# Note CRLF == \015\012
my $no_top = "-ERR Not supported by proxy\015\012";

# %peer - mapping of client socket => server socket, and vice versa.
#
# Keys are stringified references to IO::Socket objects, values are
# actual references to the same.  It's a little ugly to contemplate,
# but it works just dandy.
#
# The Peer mapping is removed when the peer is closed.  Thus, if
# you're reading data on $socket:
#
# The destination of this data is $reading_buf{$peer{$socket}}, and,
# If there is no destination any more, there's no point in reading the
# data, so shut down, and,
# If you read some data, add the $peer{$socket} to the Writeable set,
# because now you want to write something to him.
#
# And, if you're writing to $socket,
#
# The data is in $writing_buf{$socket}, and,
# Once all the data is written, you should close $socket if
# $peer{$socket} is missing.
my %peer;

# %is_client - stringified IO::Socket references for keys, true or
# false values based on whether that socket is connected to a client
# or the server.
my %is_client;

#################
# Buffers galore.
#################

# The general flow of data is:
#
# data from $socket -> $peer = $peer{$socket} -> read data into
# $reading_buf{$peer} -> hook protocol, snarfing to $message{$peer} if
# needed -> move data into $writing_buf{$peer} -> write data to $peer

# %reading_buf - keys are sockets, value is buffer of data read from that
# socket's peer, waiting to be proxy'd to the socket.
my %reading_buf;

# %writing_buf - keys are still sockets, value is data from the
# %reading_buf buffer which is now ready for writing to the socket.
my %writing_buf;

# Hash of socket => buffer, buffer is filled up with the message being
# snarfed.  Then the buffer is scanned and modified, then copied into
# $writing_buf{$socket} and flushed back to the client.
my %message;

# Hash of socket => enums, set to the reason we're snarfing a
# multiline response into %message_for array for this socket.  Set to
# zero (false) if we're NOT snarfing this data.
my %snarfing;

# Hash of Client socket => queue of commands the client has requested.
my %client_commands;

# Hash of listening sockets - keys are stringified socket object refs,
# values are the host:port we should proxy connections on that socket
# to.
my %proxyto;

# Flags - toggled on and off to indicate if we're reading a multiline
# response or not.  Keys are sockets.
my %reading_multiline_response;

# Hash - keys are sockets, values are HiRes timer floats.  Used to
# time downloads.
my %snarf_start;

########
# "Main"
########

# Get in your directory
chdir "$FindBin::RealBin";

read_config() if -s "./hostmap.txt";

my $cl_proxyto;
my $helpflag = 0;
my $listenalladdr =0;
usage() unless GetOptions("logfile:s" => \$logfile,
			  "nopad" => \$respect_byte_count,
			  "allowtop" => \$allow_top,
			  "maxscan=i" => \$max_scan_size,
			  "exitport=i" => \$exit_port,
			  "host=s" => \$cl_proxyto,
			  "help" => \$helpflag,
			  "listenalladdr" => \$listenalladdr
			  );

usage() if $helpflag;

$LocalAddr = '0.0.0.0' if ($listenalladdr);

if ($cl_proxyto) {
  warn "WARNING: $cl_proxyto overrides hostmap.txt entry: $hostmap{110}\n"
      if exists $hostmap{110};

  # We're nice to command line users.  If you tag a :port onto your
  # hostname, that's cool, otherwise, you get :110 for free.
  $cl_proxyto .= ':110' unless $cl_proxyto =~ /:\d+$/;
  $hostmap{110} = $cl_proxyto;
}

die "No proxy host!  Use --host or hostmap.txt\n" unless keys %hostmap;

# Prevent concurrent proxies - kill any previous instance
if (IO::Socket::INET->new(PeerAddr => 'localhost',
			  PeerPort => $exit_port,
			  Proto    => "tcp",
			  Type     => SOCK_STREAM)) {
  warn "WARNING: Existing proxy killed\n";
}

if ($logfile) {
  # Redirect stdout and stderr to logfile if specified.

  # Windows strangeness - you can't reopen STDOUT/STDERR successfully
  # under wperl.exe unless you've already closed it.  Go figure.
  close STDOUT;
  close STDERR;

  open(STDOUT, "> $logfile") or die "Can't redirect stdout: $!";
  open(STDERR, ">&STDOUT")   or die "Can't dup stdout: $!";
}

$| = 1;

# The SpamAssassin scanner.
#
# We tell it to use ./user_prefs, and not to try to copy in a default
# if it's not there (because he goes looking for a template file in
# all the usual places to copy over - all the usual *UNIX* places).
# We tell it to run only local tests, because otherwise, you'll get
# complaints and timeouts when it can't find a dcc app to run or it
# can't find a DNS server or the network is the wrong color or
# whatever.  In theory, you could get away with DNS RBL checks, but
# I've had a hard time making Net::DNS work on my Win32 setup.
# Hackers welcome, best of luck.  See notes at end of file.
my $spamtest = Mail::SpamAssassin->new({
  userprefs_filename => './user_prefs',
  dont_copy_prefs => 1,
  local_tests_only => 1,
});

my $readable = IO::Select->new;
my $writeable = IO::Select->new;

# Create sockets to listen on.
foreach my $port (keys %hostmap) {
  my $listener = IO::Socket::INET->new(LocalPort => $port, Listen => 5,
  					LocalAddr => $LocalAddr,
				       Reuse => 1);

  die "Can't create socket for listening: $!" unless $listener;
  print "Listening for connections on port $port (proxy $hostmap{$port})\n"
      if DEBUGGING;

  $readable->add($listener);
  $proxyto{$listener} = $hostmap{$port};
}

# Create the "exit socket" - any connection on this socket from
# localhost will cause us to exit.
my $exit_socket;
if ($exit_port) {
  $exit_socket = IO::Socket::INET->new(LocalPort => $exit_port, Listen => 1,
  					LocalAddr => $LocalAddr,
				       Reuse => 1);
  $readable->add($exit_socket);
}


while(1) {

  my ($toread, $towrite) = IO::Select->select($readable, $writeable);

  foreach my $socket (@$toread) {

    if ($socket == $exit_socket) {
      all_done($socket);
      next; # Just in case it wasn't from localhost
    }

    # Is it a new connection?
    if (exists $proxyto{$socket}) {

      dump_data_structs() if (DEBUGGING > 1);

      # Open connection to remote, add to readable set, map it
      # to this new client connection.
      my $remote = IO::Socket::INET->new(PeerAddr=>$proxyto{$socket});
      $readable->add($remote) if $remote;

      if (not $remote) {
	# Break the incoming new client off, create a new
	# listener to try again.
	print "Connect to remote: $proxyto{$socket} FAILED: $@\n" if DEBUGGING;
	my $port = $socket->sockport;
	$socket->close;
	$readable->remove($socket);
	my $listener = IO::Socket::INET->new(LocalPort => $port,
  					LocalAddr => $LocalAddr,
					     Listen => 5, Reuse => 1);
	die "Can't create socket for listening: $!" unless $listener;
	$readable->add($listener);
	$proxyto{$listener} = $hostmap{$port};
	next;
      }

      # Accept the connection and add it to our readable list.
      my $new_sock = $socket->accept;
      $readable->add($new_sock) if $new_sock;
      die "Can't create new socket for incoming connection: $!"
	  unless $new_sock;

      # Create proxy/peer mapping, set client/server indicators,
      # create buffers, etc.
      $peer{$new_sock} = $remote;
      $peer{$remote} = $new_sock;
      $is_client{$new_sock} = 1;
      $is_client{$remote} = 0;
      $message{$new_sock} = '';
      $snarfing{$new_sock} = 0;

      # The first thing we'll see is a response to no command at
      # all - "+OK Welcome to foobar.com" - so we seed the
      # command queue with a dummy command to eleminate warnings
      # later on.
      $client_commands{$new_sock} = [('none')];
      foreach ($new_sock, $remote) {
	$reading_buf{$_} = '';
	$writing_buf{$_} = '';
      }

      if (DEBUGGING) {
	print "\nNew connection:\n";
	print "From: ", $new_sock->peerhost, ':',
	$new_sock->peerport,"\n";
	print "To:   ", $remote->peerhost, ':',
	$remote->peerport, "\n";
      }

    } else {  # It's an established connection

      my $key;
      if (DEBUGGING) {
	if ($socket->connected) {
	  $key = $socket->peerhost . ':' . $socket->peerport;
	} else {
	  $key = "$socket";
	}
      }
      my $proxy; # Which socket we're going to proxy this data to
      if (exists $peer{$socket}) {
	$proxy = $peer{$socket};
      } else {
	# No peer.
	print "\n$key - peer gone on read" if DEBUGGING;

	# No need to keep hearing about how it's ready to be
	# read - we've got no use for subsequent data.
	$readable->remove($socket);

	# Tear down connection, unless there's data waiting to
	# be written to it - in that case, we'll catch it in
	# writeables and close it when we're done.
	if (! data_waiting($socket)) {
	  print ", nothing to write, closing socket" if DEBUGGING;
	  clean_up($socket);
	}
	print "\n" if DEBUGGING;
	next;
      }

      # Why 4096 bytes?  I dunno.  You got a better buffer size?
      unless (my $n = sysread($socket, $reading_buf{$proxy}, 4096,
			      length($reading_buf{$proxy}))) {
	warn "sysread: $!\n" if not defined $n;
	# Shut down the socket
	print "\n$key - socket close on read" if DEBUGGING;
	clean_up($socket);
	# Remove the proxy map
	delete $peer{$socket};
	delete $peer{$proxy};
	if (! data_waiting($proxy)) {
	  # No pending data - tear down the peer as well.
	  print ", closing peer too" if DEBUGGING;
	  clean_up($proxy);
	}
	print "\n" if DEBUGGING;
	next;
      }

      if (DEBUGGING > 2) {
	$is_client{$socket} ? print "C< " : print "S< ";
	print "\n";
      }

      # Got data from a socket.  Go do something clever with it.
      run_hooks($proxy);
    }		
  } # End of readables

  # Next, do something with each socket ready to write.  Like, write
  # to it.
  foreach my $socket (@$towrite) {

    my $key;
    if (DEBUGGING) {
      if ($socket->connected) {
	$key = $socket->peerhost . ':' . $socket->peerport;
      } else {
	$key = "$socket";
      }
    }

    my $wrote = syswrite($socket, $writing_buf{$socket}) or do {
      warn "syswrite: $!\n";
      print "\n$key - socket close on write" if DEBUGGING;
      clean_up($socket);
      # Remove the proxy map
      if (exists $peer{$socket}) {
	my $proxy = $peer{$socket};
	delete $peer{$proxy};
	delete $peer{$socket};
	
	if (! data_waiting($proxy)) {
	  print ", closing peer too" if DEBUGGING;
	  clean_up($proxy);
	}
      }

      print "\n" if DEBUGGING;
      next;
    };

    if (DEBUGGING > 2) {
      $is_client{$socket} ? print "C> " : print "S> ";
      print "\n";
    }

    # Scrub the just-written data from the buffer
    substr($writing_buf{$socket}, 0, $wrote, "");

    # All done writing?
    if (! length($writing_buf{$socket})) {
      $writeable->remove($socket);

      if (! exists $peer{$socket}) {
	# No peer?  Tear down connection.
	print "\n$key - peer gone after write, closing\n" if DEBUGGING;
	clean_up($socket);
	next;
      }
    }
  } # end of writeables
}


# data_waiting($socket)
#
# Returns true if there's any data waiting to be proxy'd to this socket.
#
# Reason this works - we only check data_waiting() on a socket *after*
# we've closed it's peer.  Closing the peer in clean_up(), below, will
# have the effect of flushing any pending %message buffers (and
# %reading_buf, for that matter) to %writing_buf, and hence, all the
# data which is "waiting" is, in fact, guaranteed to now be waiting.
sub data_waiting {
  my $socket = shift;
  return (length($reading_buf{$socket}) or length($writing_buf{$socket}));
}


# clean_up($socket)
#
# Given a socket, close it, stop selecting it for anything, clean up
# all our structs that refer to it, set the peer if any to flush
# buffers.
sub clean_up {
  my $socket = shift;

  # This socket is history.  If there's a peer, then that peer
  # currently has all the data it's ever gonna get.  Flush that data
  # into the writing_buf and add it to the writeable set.
  #
  # Ok, technically, this *could* burn you if what you were caching
  # away in %message was a multiline TOP response that you were
  # going to discard anyway, and now I'm going to flush it to the
  # client, instead.  Look, the client is going to get an error
  # condition *anyway* because the darn socket is GONE, man, just
  # like that, in the middle of a multiline response!  I will
  # venture to say that no harm will come of this - but if it does,
  # we can always make this behave a lot more like a "last ditch"
  # run_hooks() session.
  if (exists $peer{$socket}) {
    my $proxy = $peer{$socket};
    $writing_buf{$proxy} .= $message{$proxy} if exists ($message{$proxy});
    $writing_buf{$proxy} .= $reading_buf{$proxy};
    $reading_buf{$proxy} = '';
    $message{$proxy} = '';
    $snarfing{$proxy} = 0;
    if (length ($writing_buf{$proxy})) {
      $writeable->add($proxy);
      print "\nFlushing peer on close\n" if DEBUGGING;
    }
  }

  # Note that you can apparently remove a socket more than once from
  # an IO::Select set.  Also you can delete a key/value pair from a
  # hash that doesn't exist.  Love Perl.  DWIM.
  $readable->remove($socket);
  $writeable->remove($socket);
  $socket->close;
  delete $reading_buf{$socket};
  delete $writing_buf{$socket};
  delete $is_client{$socket};
  delete $snarfing{$socket};
  delete $message{$socket};
  delete $client_commands{$socket};
  delete $reading_multiline_response{$socket};
  delete $snarf_start{$socket};
}


# run_hooks($socket)
#
# This is where we hook the POP3 protocol.  Called whenever a socket
# gets new data in it's buffer, we can do whatever you want here.  The
# default is to wait until there's a \n in the %reading_buf buffer, then (in
# a loop) move all those bytes into the %writing_buf buffer (giving us the
# window to look at a full line of I/O), then add the socket to the
# writeable set, thereby causing the contents of %writing_buf to get
# flushed to the socket.
#
# Under certain conditions, though, we'll want to intercept the
# protocol, at which point we snarf the data off into %message until
# it's done, then we look at it or replace it or something, and THEN
# we ship it off to %writing_buf for flushing to the client.
#
# Client commands are pushed onto a queue of commands, server
# responses shift commands off that queue.  This way we can support
# pipelining client/servers, per rfc 2449
#
# Note - logically, the %peer mapping must be intact when you get
# here.  The main loop enforces this.  You may assume that
# $peer{$socket} will exist and be valid in this routine.
my $pos;
sub run_hooks {
  my $socket = shift;

  # This loop looks for the first occurance of a \n in a string,
  # then MOVES all of the string up to and including the \n into the
  # output buffer and adds the socket to the set of sockets we'd
  # like to write to.  Then it loops looking for another \n.
  #
  # Just before the move, you can examine the beginning of
  # $reading_buf{$socket} to see what kinds of interesting thingies might
  # be in there, in the confidence that it's a real full line of
  # data from the protocol.  You can say things like:
  #
  # $reading_buf{$socket} =~ /^(.*)$/m  # /m lets $ match next to embedded \n
  $pos = -1;
  while (($pos = index($reading_buf{$socket}, "\012", 0)) > -1) {
    # Right here you can examine $reading_buf{$socket}
    if ($is_client{$socket}) {
      # Hooks here for data from the server to the client

      # Responses from the server are interesting.  They can be
      # single line, in which case they MUST start with "+OK" or
      # "-ERR", or else they're part of a multiline response,
      # such as a LIST or RETR command, in which case they MUST
      # end with a CRLF.CRLF.

      if ($reading_buf{$socket} =~ /^(\+OK|-ERR)/i
	  and not $reading_multiline_response{$socket}) {

	# Response to a command
	my $command = shift @{$client_commands{$socket}};

	print $peer{$socket}->peerhost . ':' .
	    $peer{$socket}->peerport .
	    " (Server) said $1 to $command\n" if DEBUGGING;
	
	# Always include the greeting line in the log.
	if (DEBUGGING and $command eq 'none') {
	  print $reading_buf{$socket};
	}

	die "Assertion failed: snarfing outside multiline response"
	    if ($snarfing{$socket});

	# Only interested in snarfing successful response -
	# none of the error responses are multiline.
	if (substr ($1, 0, 1) eq '+') {
	  if ($command =~ /^TOP$/i and not $allow_top) {
	    print "Snarfing TOP response\n" if DEBUGGING;
	    $snarfing{$socket} = TOP;
	  }
	
	  if ($command =~ /RETR/i) {
	    print "Snarfing RETR response\n" if DEBUGGING;
	    $snarf_start{$socket} = Time::HiRes::gettimeofday
		if TIMERS;
	    $snarfing{$socket} = RETR;
	  }

	  if ($command =~ /CAPA/i) {
	    print "Snarfing CAPA response\n" if DEBUGGING;
	    $snarfing{$socket} = CAPA;
	  }
	}
	
      } elsif ($reading_buf{$socket} =~ m|^\.\015?\012|) {
	# End of a multiline response

	$reading_multiline_response{$socket} = 0;

	if ($snarfing{$socket}) {
	  print "Detected end of snarfed multiline\n" if DEBUGGING;

	  printf "Download took %.8f seconds\n",
	  Time::HiRes::gettimeofday - $snarf_start{$socket}
	  if (DEBUGGING and TIMERS);

	  # At this point, $message{$socket} contains the
	  # full multiline response, +OK up to but not
	  # including this trailing ".CRLF".

	  if ($snarfing{$socket} == RETR) {

	    # Right here, $message{$socket} is ripe for
	    # scanning.
	    scan_mail(\$message{$socket});
	    $writing_buf{$socket} .= $message{$socket};

	  } elsif ($snarfing{$socket} == TOP) {
	    # Eat the .CRLF, add the error message to the
	    # output buffer, flush said output buffer,
	    # clean up your structs and move on.
	    substr($reading_buf{$socket}, 0, $pos+1, "");
	    $writing_buf{$socket} .= $no_top;
	    $message{$socket} = '';
	    $snarfing{$socket} = 0;
	    $writeable->add($socket);
	    next;
	  } elsif ($snarfing{$socket} == CAPA) {
	    # Strips out the TOP response, if any.
	    $message{$socket} =~ s/\012TOP[^\012]*\012/\012/ig
		if not $allow_top;
	    # Strips out the SASL response, if any.
	    $message{$socket} =~ s/\012SASL[^\012]*\012/\012/ig;
	    $writing_buf{$socket} .= $message{$socket};
	  }
	  $message{$socket} = '';
	  $snarfing{$socket} = 0;
	}
      } else {
	# Part of a multiline response.  Flip the ready flag,
	# you won't be ready to see another response until you
	# see your CRLF.CRLF
	$reading_multiline_response{$socket} = 1;
      }

      # At this point, snarf data into %message if snarfing and
      # move along.
      if ($snarfing{$socket}) {
	$message{$socket} .=
	    substr($reading_buf{$socket}, 0, $pos+1, "");

	# Check size of snarfed message and stop snarfing if it's
	# getting too big - see notes at $max_scan_size.
	if ($max_scan_size != 0 and
	    length($message{$socket}) > $max_scan_size) {

	  print "Message exceeding max scan size, abandoning snarf\n"
	      if DEBUGGING;

	  $writing_buf{$socket} .= $message{$socket};
	  $message{$socket} = '';
	  $snarfing{$socket} = 0;
	  $writeable->add($socket);
	}

	next;
      }

    } else {
      # Hooks here for data from the client to the server

      # Spot the client's command, add to the queue.
      my ($command) = $reading_buf{$socket} =~ /^(\S+)\s/;

      print $peer{$socket}->peerhost . ':' . $peer{$socket}->peerport .
	  " (Client) said $command\n" if DEBUGGING and $command;

      # AUTH is a special case, see discussion elsewhere.  Must
      # not have any commands in the queue, and we reply back to
      # the socket immediately with an error.
      if ($command and $command =~ /^AUTH$/i) {
	if (scalar(@{$client_commands{$peer{$socket}}})) {
	  die "I so can't cope with AUTH commands while pipelining";
	}

	print "AUTH Rejected\n" if DEBUGGING;
	substr($reading_buf{$socket}, 0, $pos+1, "");
	# Note - $no_top is a generic -ERR response, works fine.
	$writing_buf{$peer{$socket}} .= $no_top;
	$writeable->add($peer{$socket});
	next;
      }

      push (@{$client_commands{$peer{$socket}}}, $command) if $command;
    }

    # Default action after all your shots at hooking and magic,
    # etc.: Move the data to the writing buffer, and set it up to
    # get written.

    $writing_buf{$socket} .= substr($reading_buf{$socket}, 0, $pos+1, "");
    $writeable->add($socket);
  }
}

sub dump_data_structs {
  # Dump your current key per-connection data structs
  print "\nExisting proxy/peer mappings:\n";
  print map "$_ => $peer{$_}\n", keys %peer;
  print "\nExisting is_client flags:\n";
  print map "$_ => $is_client{$_}\n", keys %is_client;
  print "Existing socket reading_buf buffers:\n";
  print map "$_ => $reading_buf{$_}\n", keys %reading_buf;
  print "Existing socket writing_buf buffers:\n";
  print map "$_ => $writing_buf{$_}\n", keys %writing_buf;
  print "Existing message buffers:\n";
  print map "$_ => $message{$_}\n", keys %message;
  print "Existing snarfing flags:\n";
  print map "$_ => $snarfing{$_}\n", keys %snarfing;
  print "Existing command queues:\n";
  print map "$_ => @{$client_commands{$_}}\n", keys %client_commands;
  print "Existing reading_multiline_response flags:\n";
  print map "$_ => $reading_multiline_response{$_}\n",
  keys %reading_multiline_response;
  print "Existing snarf_start values:\n";
  print map "$_ => $snarf_start{$_}\n", keys %snarf_start;
}

# @mail - array of lines of a mail message.  Some notes on memory
# usage here:
#
# Big mail messages getting copied about will chew up memory right
# quick.  I start with one copy of the message built up in a scalar
# buffer, then I need a second copy, broken out into an array of
# lines, for Mail::SpamAssassin::NoMailAudit to chew on.  That's two
# copies.
#
# I can save a copy's worth of memory by MOVING the lines from the
# scalar buffer into the array - but then, once SpamAssassin is done
# chewing on them, I have to put them BACK into the scalar buffer.  If
# I'm not removing them from the SpamAssassin::NoMailAudit object as I
# do that, I'm going to wind up with a second copy of the mail
# *anyway*.  And that kind of removal is nasty and creeps inside of
# the objects encapsulation, where I really ought not go.
#
# NoMailAudit::as_string() returns a copy of the mail as a string, but
# to do so, it creates a big ol' scalar on the stack to return.
# Simple, but it costs a THIRD chunk of memory the size of the
# message.
my @mail;

sub scan_mail {
  my $mailref = shift;
  my $bytecount = length $$mailref;

  $$mailref =~ s/\012\.\./\012\./g; # un-byte-stuff

  @mail = split /^/, $$mailref;

  my $response = shift @mail;

  # SpamAssassin::NoMailAudit adds a Unix mbox From_ line, unless
  # you construct your NoMailAudit message with the (ahem,
  # undocumented) add_From_line param set to false.  That From_
  # kinda breaks the protocol - the client isn't expecting mbox,
  # he's expecting raw 822 mail - so we leave it out.
  my $message = Mail::SpamAssassin::NoMailAudit->new(data => \@mail,
						     add_From_line => 0);
  my $start;
  $start = Time::HiRes::gettimeofday if TIMERS;
  my $status = $spamtest->check($message);
  printf "Spam check took %.8f seconds\n",
  Time::HiRes::gettimeofday - $start if (DEBUGGING and TIMERS);

  my $id = $message->get('Message-id') || '*none*';
  print "$bytecount bytes, ",
  $status->is_spam() ? 'SPAM' : 'NOT spam',
  ", Message-id: $id\n" if DEBUGGING;

  print $status->get_report() if DEBUGGING and $respect_byte_count;
  $status->rewrite_mail() unless $respect_byte_count;

  if ($status->is_spam ()) {
    if ($respect_byte_count) {
      # DAN - danger, you don't know if you're in the headers or not.
      $$mailref =~ s/\012Subject: [^\012]{6}/\012Subject: *SPAM*/i or
	  $$mailref =~ s/\012Received: [^\012]{6}/\012Received: *SPAM*/i;
    } else {
      # What as_string() does as of SpamAssassin v2.31:
      #  return join ('', $self->get_all_headers()) . "\n" .
      #                join ('', @{$self->get_body()});
      $$mailref = $response;
      $$mailref .= $message->get_all_headers();
      $$mailref .= "\015\012";
      foreach my $line (@{$message->get_body()}) {
	$$mailref .= $line;
      }
      # SA's markups end with \n instead of CRLF's.  Gotta
      # change those here.
      $$mailref =~ s|(?<!\015)\012|\015\012|g;
    }
  } else {
    if (not $respect_byte_count) {
      $$mailref = $response;
      $$mailref .= $message->get_all_headers();
      $$mailref .= "\015\012";
      foreach my $line (@{$message->get_body()}) {
	$$mailref .= $line;
      }
      # SA's markups end with \n instead of CRLF's.  Gotta
      # change those here.
      $$mailref =~ s|(?<!\015)\012|\015\012|g;
    }
  }
  $status->finish();
  $$mailref =~ s/\012\./\012\.\./g; # byte-stuff
}


sub all_done {
  my $socket = shift;
  my $new_sock = $socket->accept;
  if ($new_sock->peerhost eq '127.0.0.1') {
    print "Connection on exit socket, exiting\n" if DEBUGGING;
    exit;
  } else {
    print "Connection on exit socket from non-local host!\n" if DEBUGGING;
    $new_sock->close;
  }
}


sub read_config {
  open (CONFIG, "./hostmap.txt") or die "Can't read hostmap.txt: $!\n";
  # Straight from the cookbook 8.16
  while (<CONFIG>) {
    chomp;                  # no newline
    s/#.*//;                # no comments
    s/^\s+//;               # no leading white
    s/\s+$//;               # no trailing white
    next unless length;     # anything left?
    my ($port, $proxyto) = split(/\s*=\s*/, $_, 2);
    $hostmap{$port} = $proxyto;
  }
}


sub usage {
  print <<EOT;
Usage: $0 --host some.host.name [options]
Options include:
  --logfile filename
      Use filename as the log file.  Default is pop3proxy.log.  If the
      filename is omitted, log to STDOUT.
  --nopad
      If nopad is specified, then message sizes will not be changed as a
      result of spam scanning.  The default is to add to the message size.
  --allowtop
      If top is specified, then the POP3 "TOP" command will be passed through
      to the server.  The default is to reject client TOP commands with an
      error message.
  --maxscan bytes
      Messages which exceed this size will not be scanned for spam.  The
      default is 250000.  Setting this to zero disables this behavior.
  --exitport port
      Any connection from localhost on this port will cause us to exit.
      The default is 9625.  Setting this to zero disables this behavior.
EOT
  exit;
}

__END__

###################
# DEVELOPMENT NOTES
###################

# From RFC 2449, Extensions to POP3, in the disuccsion on PIPELINING:
#
# Some POP3 clients have an option to indicate the server supports
# "Overlapped POP3 commands." This capability removes the need to
# configure this at the client.
#
# (Note: Usenet suggests that's Eudora.)
#
# 1. The protocol otherwise is implicitly synchronous, if not explicitly.
#
# 2. 2449's CAPA can return TOP, indicating it's available, in which
# case, $allow_top is hosed.  So I have to catch CAPA replies and
# strike out TOP.
#
# Along with catching TOP in CAPA responses, we should also catch
# SASL, to reduce the misconception that we'll be proxying AUTH
# commands (see below).



# From RFC 1734, the POP3 AUTHentitication command
#
# If the client says AUTH foobar, it's suggesting to the server that
# they encrypt the communication using foobar.  That would make my
# life difficult.  I'm not saying it can't be done, just that it's
# more work.  In particular, I'm betting that there are dandy modules
# up on CPAN (Love CPAN) which will do the heavy lifting, but my
# primary target is Win32 boxen where building and installing such
# modules will be a large pain.
#
# So we reject the AUTH command (just like top).
#
# Except that we can't reject AUTH commands "just like TOP" because we
# can't let them get thru - if we let them get thru, then the server
# and client are in a negotiation which we have to break up to be
# clean.
#
# So instead, I'm using the following assumption to facilitate a hack:
#
# I assume that AUTH is a non-pipelined command, always.  This allows
# me to bypass the loop logic of pushing it onto the command queue,
# and then doing something when it comes off - in fact, I will REQUIRE
# that the command queue be EMPTY when I see an AUTH, or else I'll die
# right then and there.
#
# Reasons why this is a safe assumption:
#
# AUTH is only valid in the authentication phase of our show, which
# doesn't lend itself too much to pipelining.  But even more
# compelling is that AUTH implicitly sets up a challange/response gig
# with the server, for which pipelining is useless.



# We do not provide DNS RBL checking, although in theory this should
# be useable if if you also go to the trouble of installing the
# Net::DNS module and getting that to work.
#
# Note that network delays due to RBL or checksum lookups, which are
# masked as overall delivery delays when run at an MTA or before
# getting to a mail spool, are perceived as very real delays by the
# user when run during a POP3 session.  That's an unavoidable price we
# have to pay for filtering during a POP session, I'm afraid.  See
# also the notes on $max_scan_size for more on the topic of time and
# delay.
#
# So, to sum up network checks:
#
# -DNSRBL - Should work, with Net::DNS.  Set
#
# $Mail::SpamAssassin::PerMsgStatus::IS_DNS_AVAILABLE = 1;
#
# This will short circuit is_dns_available() if you still leave
# local_tests_only set to true.  That way, you get DNS RBLs w/o the
# overhead of discovering that DCC and Razor aren't available every
# time thru.  Note, though, that that's a true hack and the next
# release of SpamAssassin should punish you by breaking it.
#
# DCC - Nope, it's looking for an external process.  Yuck.
#
# -Razor - not easily, requires MANY modules, like Mail::Internet,
# Digest::SHA1, Net::Ping (which is broken under activestate's distro
# in some cases)... blech.


=head1 NAME

Pop3proxy - a SpamAssassin-enabled POP3 proxy designed for Win32 users.

=head1 SYNOPSIS

C<wperl.exe c:\pop3proxy\pop3proxy.pl -host some.host.name [-logfile
filename] [-nopad] [-allowtop] [-maxscan bytes] [-exitport port]>

C<wperl.exe c:\pop3proxy\kill_proxy.pl>

=head1 DESCRIPTION

=head2 OVERVIEW

If you read email on a Win32 platform with a POP3 mail client, it can
be a little challenging to filter your mail.  One of the best spam
detection solutions available today is SpamAssassin, which is
(unfortunately for Windows users) a little biased towards Perl and
Unix.

This proxy is designed to allow a Win32 POP3 mail client to use
SpamAssassin to filter their mail.  It does so by standing between the
user's mail client and the POP3 server, seamlessly proxying data back
and forth between them.  When a mail message is retrieved, the proxy
waits to read the full message before handing it to the client.  After
it has the full message, it uses SpamAssassin to check to see if it's
spam, marks up the message accordingly, and then returns it to the
client.  The client can then easily filter the mail based on
SpamAssassin's markups, such as the inclusion of "*****SPAM*****" at
the beginning of the subject line.

Installing this proxy is not for the faint of heart - SpamAssassin is
not supported on Win32 platforms, and you'll need to be comfortable
with downloading, uncompressing, installing, and copying files, as
well as editing text configuration files.  You might need to know a
little about networking, and if your client is not one we've tested
with, you might need to learn a little about the POP3 protocol.  But
generally speaking, anyone comfortable with installing Perl on their
Win32 platform will be able to make use of this proxy.  See
L<"INSTALLING">, below.

If all this is a little daunting, Pop3proxy isn't for you.  Consider
one of the fine alternatives available from Deersoft inc., instead,
for your Windows based SpamAssassin solution.  You can find them at
http://www.deersoft.com.

Pop3proxy supports most of the POP3 protocol as per RFCs 1939 (POP3
protocol) and 2449 (Extensions to POP3).  Since marking up a message
in transit between the POP3 server and the mail client can "break" the
protocol, there are a variety of options to help your client deal with
this.  Pop3proxy was tested with a number of popular Win32 mail
clients, which were found to be pretty good at dealing with the
protocol being "broken" in this way, but your mileage may vary.

Notably, Pop3proxy does not support RFC 2222, Simple Authentication
and Security Layer (SASL), nor RFC 1734, POP3 AUTHentication command.
By default, the optional POP3 C<TOP> command is not supported, but you
can override this (see B<--allowtop> in L<"OPTIONS">, below).  C<SASL>
and C<TOP> are also removed from the results of an RFC 2449 C<CAPA>
command as well.  RFC 2449 Pipelining is supported.

SpamAssassin is a very full featured mail analysis package, but due to
the constraints of the Win32 platform and the nature of being a
network proxy, Pop3proxy provides ONLY basic SpamAssassin text
scanning and markup.  Razor, DCC, and the auto_whitelist feature are
not supported.  DNS based RBL checking is not supported.

Pop3proxy also does not report any detected spam - if you want to
complain about spam, that's up to you, we just try to spot it.

=head2 INSTALLING

=over 4

=item * Download Pop3proxy

Download the C<pop3proxy.pl> script and save it in its own directory,
for example, C<c:\pop3proxy>.  (The rest of this document will refer
to C<c:\pop3proxy>, but naturally, you can install it anywhere you'd
like.)

=item * Install Perl

You need Perl for your Win32 box.  Here you should acquaint yourself
with the good people at www.ActiveState.com if you're not already
familiar with them.  Pop3proxy was written against ActiveState Perl
v5.6.1 build 631, and probably requires a pretty modern Perl.
Download and install their Perl if you don't already have it.

=item * Install Time::HiRes

SpamAssassin, as of this writing, requires the Time::HiRes module.
(Pop3proxy itself will also use it if it's available.)  Using
ActiveState's C<ppm> package manager to install Time::HiRes worked
fine for me.  See the documentation that comes with ActiveState for
more details on using C<ppm>.

=item * Download SpamAssassin

Download SpamAssassin from http://spamassassin.org/.  As of this
writing, it was available as both a .tar.gz or a .zip, you can use
whichever you're more comfortable working with on Win32.  Pop3proxy
was written using SpamAssassin v2.31.

=item * Install SpamAssassin Manually

Unpack the Mail::SpamAssassin distribution.  Since trying to get the
module to build via the traditional "make" method on a Win32 platform
can be difficult, we try to get away with a "manual" installation.
- From the unpacked distribution, copy C<lib\*> to C<c:\perl\site\lib>,
assuming you've installed Perl into C<c:\perl>.  Since the bits of the
SpamAssassin package that Pop3proxy uses are pure Perl (as of this
writing), that should do it.

=item * Install SpamAssassin Rules

Next, also from the SpamAssassin distribution, copy the C<\Rules>
directory and all it's contents to C<c:\pop3proxy\Rules>.

=item * Install SpamAssassin user_prefs

Finally, copy C<c:\pop3proxy\Rules\user_prefs.template> to
C<c:\pop3proxy\user_prefs>.  This is your SpamAssassin prefs file, you
can use it to configure the SpamAssassin scanner in Pop3proxy.  See
the SpamAssassin docs for more details.

=item * Configure Shortcuts

At this point, you might create a C<hostmap.txt> file if you need to
proxy to more than one server.  See L<"CONFIGURING CLIENTS AND
SERVERS">, below, for details.

Otherwise, create a shortcut to launch the proxy.  ActiveState Perl
installs the undocumented C<wperl.exe> in the C<\perl\bin> directory.
This is a version of the Perl interpreter for Windows that does not
require a "console" or "DOS Box" window to run, which makes it nice
for running processes like Pop3proxy without leaving an extra window
lying around.  You can create a shortcut to the proxy that invokes it
like this:

   wperl.exe c:\pop3proxy\pop3proxy.pl --host my.pop3.host

...replacing C<my.pop3.host> with your POP3 server's name, of course.

This shortcut can be launched any way you like - from a desktop icon,
or a start menu entry, or even from your StartUp folder, if you want
to run it all the time.  The important thing, of course, is to make
sure it's running before you attempt to fetch your mail.

You can stop the proxy process a number of ways.  The cleanest is to
use the C<kill_proxy.pl> script, which comes with the proxy.  If run
on the same host as the proxy, it will cause the proxy to exit.  You
should also be able to use the Win32 task manager to stop the proxy,
which will show up as WPERL or PERL, depending on how you ran it.  If
you ran it from a DOS prompt using C<perl.exe>, you should be able to
use C<Control-c> to kill the proxy, although some users have reported
that this doesn't work all the time.

=item * Configure Mail Client

All done!  You must configure your mail client to use the proxy
instead of your POP3 server.  See L<"CONFIGURING CLIENTS AND
SERVERS">, below, for details.

=back

=head2 CONFIGURING CLIENTS AND SERVERS

The default port for POP3 is 110.  If you specify a host on the
Pop3proxy command line using the C<--host> flag, then the proxy will
proxy connections on port 110 on your local machine to port 110 on the
host specified.  This is the simplest case.

For example, if you normally get your mail from a host called
C<mailbox.isp.com>, you can invoke Pop3proxy like this:

   wperl.exe c:\pop3proxy\pop3proxy.pl --host mailbox.isp.com

...and then change where your mail client thinks the server is.  Where
you had configured C<mailbox.isp.com>, change it instead to use the
host named C<localhost>.  The mail client will connect to the proxy,
and the proxy will pass the data along to C<mailbox.isp.com> for you.

If you have more than one account on different POP3 servers, you'll
need to create a C<hostmap.txt> file in the C<c:\pop3proxy> directory
which will describe which local ports the proxy should map to which
remote hosts and ports.  The format of this file is one mapping per
line of the form:

   localport = remote.host.name:remoteport

Leading and trailing whitespace is ignored, and you can use the Perl
commenting convention of "C<#>" to end-of-line for comments.  So, for
example:

   # My POP3 servers, how I love them.
   110 = mailbox.isp.com:110
   818 = mail.another.place.org:110

This will cause connections on the local port 110 to proxy to
C<mailbox.isp.com> on port 110, and connections on port 818 to proxy
to host C<mail.another.place.org>, also on port 110.

Note that the command line C<--host> flag overrides the C<hostmap.txt>
entries for port 110, if any.

Pop3proxy will gladly listen on any local port you configure, but I
suggest choosing one not already in use by someone else.  That's
increasingly difficult, but these guys might help:
http://www.portsdb.org/

Of course, you also need to be able to configure your mail client to
talk POP3 on a port other than 110.  Most modern mail clients can be
configured this way, see L<"TESTED MAIL CLIENTS">.  A bigger issue,
though, is trying to convince some clients that "C<localhost:110>" and
"C<localhost:818>" are I<different> servers with different accounts on
them, and some mail clients are deeply confused by this - again, see
the table.  If your mailer is confused, the following workaround
should do the trick:

Create an alias for localhost in your C<C:\Windows\Hosts> file.  For
example, you might configure:

   127.0.0.1       localhost jester

which would allow you to configure your mail client to get mail from
C<localhost:110>, and C<jester:818>, both of which are actually
Pop3proxy running locally.  This simple subterfuge seems to make even
persnickety mail clients happy.

=head1 OPTIONS

=over 4

=item B<--host>=I<hostname[:port]>

Specify the host to proxy a single connection to.  Pop3proxy will
proxy local port 110 to this host.  You can optionally specify a
remote port, or the default of 110 will be used on the remote as well.
This option is in addition to any hosts configured in C<hostmap.txt>,
and will override any mapping configured in that file to listen on
port 110.

=item B<--logfile>=I<filename>

Specify the name of the file that Pop3proxy should use for logging.
The default is C<pop3proxy.log>.

This file will be created in the same directory as the script, and is
overwritten each time the proxy starts.  If you use this option but
omit the filename, Pop3proxy will log to STDOUT, which is useful for
watching things happen in a console window.

By default, Pop3proxy doesn't log much.  If you'd like more debugging
information in your log, you can edit the C<pop3proxy.pl> script and
change the value of the constant C<DEBUGGING> near the top from the
default of zero to 1, 2, or 3.  See the comments in the code for
debugging levels.

=item B<--nopad>

Specifying this option tells Pop3proxy not to change the size of a
mail message after scanning it.  The default is to increase the size
of the message by adding headers describing the results of the
SpamAssassin scan.  Some mail clients may break without this.

The POP3 protocol has some elements where the server indicates the
size of the mailbox and the size of individual messages.  The RFC
warns against depending on these counts when writing a POP3 client,
but some clients might try to anyway.  In that case, adding bytes to
the message to mark the message as spam, indicate what tests hit,
etc., would cause the message size to increase past what the client
was expecting.

Using this flag disables this behavior.  So far, we haven't seen any
mail clients that require this flag, but YMMV.

If this flag is set, Pop3proxy will flag mail as spam by
I<overwriting> the first six bytes of the C<Subject:> header with the
string C<*SPAM*>.

So:

   Subject: Fabulous Nekkid Chix For You!

becomes

   Subject: *SPAM*us Nekkid Chix For You!

Normal behavior is to I<add> the default SpamAssassin subject flag to
the beginning of the C<Subject:> header (along with other headers and
indicators).

If there is no C<Subject:> line in the mail headers (there doesn't
have to be, after all) or if it's less than 6 bytes, then we overwrite
the first six bytes of the first C<Received:> line we find instead.
C<Received:> headers aren't required, either, but you can almost
always count on one, especially if the mail is from the outside world.
(And if you're getting spammed by someone on localhost, your problems
are beyond the scope of this little program to address.  :-) Not all
mail clients can filter mail based on the contents of the C<Received:>
headers.

If this flag is set and the C<DEBUGGING> constant is true (see details
under B<--logfile> in L<"OPTIONS"> above) then the results of the
SpamAssassin scan will be dumped to the logfile for review.

=item B<--allowtop>

This flag will cause Pop3proxy to proxy the optional POP3 C<TOP>
command to the server.  The default is to decline to proxy it by
returning an error response back to the client.

The POP3 protocol implements an optional command called C<TOP>, which
returns the top of a mail message including the headers.  Servers that
don't support it are to return an error, but some mail clients may
require it.  The danger of proxying it is that the contents of the
headers as returned by the C<TOP> command may be different from the
headers after Pop3proxy has scanned the full mail and marked it as
spam.  This may or may not cause problems for the client, but just to
be safe, by default we pretend it's not supported by the server.

If your mail client isn't working with Pop3proxy, enable debugging as
outlined under B<--logfile>, above, and check your logs for
indications that your client is demanding the C<TOP> command.  This
will probably look something like the client exiting after the message
C<(Client) said TOP>.  In that case, this option might allow you to
use Pop3proxy - or you may experience weird or destructive behavior.

=item B<--maxscan>=I<bytes>

This option lets you set the maximum size of a message which will be
scanned for spam.  The default is 250000 bytes.

SpamAssassin has some well-known performance problems triggered by
very large mail messages.  Most users work around this by selecting a
message size over which they won't feed mail to SpamAssassin.

In the context of a network proxy like this, the problem is
particularly acute.  If we spend too much time scanning a message, the
client will timeout waiting for their network read to complete and
will probably abort the operation in all kinds of unpleasant ways.  As
a bonus, the message will likely remain on the server and cause the
problem I<again> if you retry.  It's just not pretty, hence this
limit.

As an added bonus, this keeps the proxy's memory usage down, since
memory use is tied to the size of the message you're buffering and
scanning.  This is polite behavior for a long-lived process like a
network proxy.

The default was selected after examining 1500 spam messages.  They
averaged 9K bytes, with a max of 110K bytes.

Setting this value to zero disables this feature, and is not advised.

=item B<--exitport>=I<port>

This option lets you specify what port Pop3proxy will listen on for
the purposes of exiting.  The default is 9625.  Any connection from
C<localhost> on this port will cause Pop3proxy to exit cleanly.  (This
is what C<kill_proxy.pl > does.)

Pop3proxy is a pure Perl program, with few if any hooks into the Win32
environment - in fact, he'll probably run just fine on any Perl
supported platform.  His main loop is blocked on C<select()>, waiting
for a socket to be ready to have data read or written, which
unfortunately isn't how Win32 apps are supposed to behave - their main
loops should be processing Windows events.  The "exit port" feature
allows us to get an exit "event" of sorts cleanly without a true Win32
event loop.

If you change this setting, you'll probably want to modify
C<kill_proxy.pl> as well.  Setting this value to zero disables this
feature.

=back

=head1 DIAGNOSTICS

All Pop3proxy messages are written to the logfile, or to STDOUT (see
B<--logfile> in L<"OPTIONS">, above).  The logfile is overwritten each
time Pop3proxy starts.

Setting the debugging flag to 1 will result in messages about clients
connecting and leaving, as well as some information about the protocol
exchange.  Each client command detected is logged as:

   127.0.0.1:1246 (Client) said RETR

and each server response is logged as:

   204.127.5.31:110 (Server) said +OK to RETR

The act of intercepting a multiline server response (such as a mail
message) before passing it back to the client is called "snarfing" in
Pop3proxy.  Many log messages will indicate what is being done with a
snarfed response - beginning, detecting the end (a C<CRLF.CRLF>
sequence), aborting if it exceeds the max configured size, etc.

If the Time::HiRes module is available, Pop3proxy will use it to time
message downloads and SpamAssassin scans, so that you can see where
the delay is if you're experiencing network timeouts on the client.

The results of a spam scan are logged, along with the Message-Id
header and size of the message as downloaded from the server.

If the B<--nopad> option is enabled, the results of all SpamAssassin
scans are printed to the log.

If you bump DEBUGGING up, Pop3proxy will log dumps of its internal
data structures each time a client connects.

If you bump it up further, you'll get logs of every read and write
from every client and server.  For hardcore debugging only, see the
comments in the code.

=head1 FILES

All of these files are found in the Pop3proxy install directory, for
example, C<c:\pop3proxy\>, following installation and configuration.

F<pop3proxy.pl>, F<kill_proxy.pl> - The Perl scripts that make up the
proxy and a program to kill it.

F<hostmap.txt> - An optional configuration file used to map local
ports to the remote host and remote port that they should be proxy'd
to.  See L<"CONFIGURING CLIENTS AND SERVERS">, above, for details and
examples.  F<hostmap.sam>, a sample hostmap file, is provided with the
distribution.

F<user_prefs> - The SpamAssassin C<user_prefs> file, which must be
installed by hand.  See L<"INSTALLING">, above.

F<pop3proxy.log> - Default log file, created by Pop3proxy.  See
B<--logfile> under L<"OPTIONS">, above to change this.  See
L<"DIAGNOSTICS">, above, for notes on contents.

=head1 CAVEATS

You may need to tinker with the default settings to get Pop3proxy to
work with your mail client, see L<"OPTIONS"> B<--nopad> and
B<--allowtop> in particular.  Mail may be lost while tinkering, so be
careful if you're venturing into unknown territory.  Consult L<"TESTED
MAIL CLIENTS">, below, to see what kind of territory you're in.

SpamAssassin is not perfect, but it's pretty darn good.  Adjust your
expectations accordingly.

=head1 BUGS

If using B<--nopad> and the mail does not contain a C<Subject:> header
but it DOES contain the string C<subject:> somewhere in the body, the
spam marking is incorrectly placed in the body rather than in a
C<Received:> header.  The C<$respect_byte_counts> regex will need to
be fixed someday to detect end of headers to fix this.

C<Control-c> won't always kill the proxy if it's running in a DOS box.

=head1 RESTRICTIONS

A client that tries to use the AUTH command while pipelining will
cause Pop3proxy to die with a message.  This is unlikely (and
irrational) behavior that we just can't deal with.

=head1 TESTED MAIL CLIENTS

 Client:              |Netscape|Mozilla|Outlook|Express|Eudora|Pegasus
 Version:             |  4.5   |  1.0  | 2002  |   6   |4.2.3 | 4.01
 Platform:            | Win98  | Win98 | WinXP | WinXP | NT4  | Win98
 _____________________|________|_______|_______|_______|______|_______
 Filter on:           |        |       |       |       |      |
  Subject header?     |  YES   | YES   | YES   | YES   |YES   | YES
  Received header?    |  YES   | YES   | NO    | NO    |YES[2]| NO
 Require TOP command? |  NO    | NO    | NO    | NO    | NO   | NO
 Use Pipelining?      |  NO    | NO    | NO    | NO    | NO   | NO
 Configure POP port?  |  NO    | YES   | YES   | YES   | NO   | YES
 Multiple POP servers?|  NO    | YES[1]| YES[6]| YES[6]| NO   | YES[5]
 Handles padded mail? |  YES   | YES   | YES   | YES   |YES[3]| YES[4]

 [1] Cannot be same host, must use alias in HOSTS file
 [2] Technically, on <any header>
 [3] Eudora seems to use byte count to limit what messages get
 downloaded, if you want, but doesn't seem affected by having the
 count increased in flight.
 [4] Pegasus reports size based on actual size downloaded, suggesting
 that it's not relying on server supplied sizes.
 [5] Pegasus must have separate hosts names for each identity to use
 multiple accounts, must use alias in HOSTS as in [1]
 [6] Outlook/Express can be configured to use the same host on two
 ports without difficulty.

=head1 SEE ALSO

http://spamassassin.org/ SpamAssassin homepage

http://www.activestate.com Providers of Perl on Win32 platforms

http://www.openhandhome.com/howtosa.html How To Use SpamAssassin on
Win32

http://nickdafish.com/SAPP.htm An earlier POP3 proxy for
SpamAssassin.

=head2 RFCs

http://www.faqs.org/rfcs/rfc1939.html RFC 1939, POP3 Protocol

http://www.faqs.org/rfcs/rfc2449.html RFC 2449, POP3 Extension
Mechanism

http://www.faqs.org/rfcs/rfc1734.html RFC 1734, POP3 AUTHentication
command

http://www.faqs.org/rfcs/rfc2222.html RFC 2222, Simple Authentication
and Security Layer

http://www.faqs.org/rfcs/rfc3206.html RFC 3206, The SYS and AUTH POP
Response Codes

=head1 AVAILABILITY

This document can be found at http://mcd.perlmonk.org/pop3proxy/

The package distribution can be found at:

http://mcd.perlmonk.org/pop3proxy/pop3proxy.zip

=head1 CHANGES

10 August 2002 - 1.0 Initial release

=head1 AUTHOR

Dan McDonald C<E<lt>McD@att.netE<gt>>

Dedicated to Lisa, who put up with me hacking on this for many lonely
nights.

Thanks to:

   The SpamAssassin creators and contributors,
   All the helpful coders on PerlMonks and Usenet,
   Larry Wall for being an all around decent human.

Special thanks to Marc and Alan for a bundle of help testing this
beast, and to Klar for numerous (great|evil) ideas.



=head1 COPYRIGHT

           Copyright (c) 2002, Dan McDonald. All Rights Reserved.
        This program is free software. It may be used, redistributed
        and/or modified under the terms of the Perl Artistic License
             (see http://www.perl.com/perl/misc/Artistic.html)

=cut
