Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode output does not roundtrip #105

Open
lenianiva opened this issue Aug 17, 2023 · 3 comments
Open

Unicode output does not roundtrip #105

lenianiva opened this issue Aug 17, 2023 · 3 comments

Comments

@lenianiva
Copy link

When I put the unicode character "∀" into cat, the output doesn't roundtrip:

Input:
∀
226 136 128 
Output:
��
195 162 194 136 194 128 

Code:

use rexpect::spawn;
use rexpect::error::*;

fn display(s: &str)
{
	println!("{}", s);
	for b in s.as_bytes()
	{
		print!("{} ", b);
	}
	println!("");
}
fn repl() -> Result<(), Error>
{
	let mut p = spawn("cat", Some(1000))?;

	let ex: String = "∀".to_string();
	p.send_line(&ex)?;
	let line = p.read_line()?;

	println!("Input:");
	display(&ex);
	println!("Output:");
	display(&line);
	Ok(())
}
fn main()
{
	repl().unwrap_or_else(|e| panic!("ftp job failed with {}", e));
}
@lenianiva
Copy link
Author

Seems to be a problem with the reader since this works with no problems:

	let output = std::process::Command::new("echo").arg("∀").output().expect("1");
	let l = std::str::from_utf8(&output.stdout).expect("2");
	println!("echo: {}", l);

@lenianiva
Copy link
Author

lenianiva commented Aug 17, 2023

I dug into this a bit more and I think the problem is with NBReader. The following test fails when put into reader.rs:

    #[test]
    fn test_expect_unicode() {
        let f = io::Cursor::new("∀ melon\r\n");
        let mut r = NBReader::new(f, None);
        assert_eq!(
            ("∀ melon".to_string(), "\r\n".to_string()),
            r.read_until(&ReadUntil::String("\r\n".to_string()))
                .expect("cannot read line")
        );
        // check for EOF
        match r.read_until(&ReadUntil::NBytes(10)) {
            Ok(_) => panic!(),
            Err(Error::EOF { .. }) => {}
            Err(_) => panic!(),
        }
    }

and this is because in read_into_buffer, the type of a u8 is coerced into a char:

    fn read_into_buffer(&mut self) -> Result<(), Error> {
        if self.eof {
            return Ok(());
        }
        while let Ok(from_channel) = self.reader.try_recv() {
            match from_channel {
                Ok(PipedChar::Char(c)) => self.buffer.push(c as char),
                Ok(PipedChar::EOF) => self.eof = true,
                // this is just from experience, e.g. "sleep 5" returns the other error which
                // most probably means that there is no stdout stream at all -> send EOF
                // this only happens on Linux, not on OSX
                Err(PipeError::IO(ref err)) => {
                    // For an explanation of why we use `raw_os_error` see:
                    // https://github.com/zhiburt/ptyprocess/commit/df003c8e3ff326f7d17bc723bc7c27c50495bb62
                    self.eof = err.raw_os_error() == Some(5)
                }
            }
        }
        Ok(())
    }

This is done because the type of PipedChar(u8) is different from the element type of buffer: String.

This behaviour is divergent from pexpect. I have 3 solutions to it:

  1. Change the type of PipedChar(u8) to PipedChar(char): If the program sends over half of a unicode char and then stop it would hang the reader
  2. Change the type of buffer to something like Vec<u8> which can't parse unicode, but it feels like this is kicking the problem down the road.
  3. Add an encoder on the receiving end of PipedChar objects to choose between the utf-8 and ascii behaviours (pexpect behaves like this

@lypanov
Copy link

lypanov commented Nov 25, 2023

Running into this issue now also. Would be lovely to see the MR merged :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants