Mailing Lists: Apple Mailing Lists

Image of Mac OS face in stamp
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: problems copying file



Daniel Child <email@hidden> wrote off-list:

>>In this Java program, did you use Reader and Writer (which translate
>>between bytes and chars), or InputStream and OutputStream (which only push
>>bytes around without translation)?
>The latter.

Actually, you used the former. The Readers you used are connected to an
InputStream, but you are quite definitely using Readers. Since the Reader
class is itself abstract, you didn't use an actual instance of Reader, but
you did use instances of Reader subclasses. This means that the
fundamental behavior of Reader, which is to represent a stream of chars
(not a stream of bytes), is intrinsic to and inseparable from the
subclasses you've used.


>>The way to avoid this problem is to use InputStream and OutputStream,
>>rather than Reader and Writer, avoiding any translations at all.
>Then it's a mystery, because that is what I used. I'll try your
>suggestion, but I don't really know UNIX yet, so not sure if it will work.

This has nothing to do with Unix, per se. The program works on Unix by
accident. When the program is correct, it will work everywhere (Unix, Mac
OS X, Windows, whatever).

The problem is you didn't use JUST InputStream and OutputStream, and you
didn't copy the original BYTES, but instead went through byte/char
translations. Those translations, unwitting though they may be, have
hopelessly mangled the original UTF8-encoded data of the RTF file.

Let's break down what you've done in the following code fragment, taking
from the source of your program. I've numbered the lines as a discussion
reference only:
1. File f1 = new File(sSourceFile);
2. FileInputStream fis = new FileInputStream(f1);
3. InputStreamReader isrFile = new InputStreamReader(fis);
4. BufferedReader brFile = new BufferedReader(isrFile);

In line 1, you make a File from a String representing the filename. The
String was previously obtained (see full source below).

In line 2, you start by opening a FileInputStream. Yes, you have an
InputStream at this point.

In line 3, you connect an InputStreamReader to the FileInputStream. At
this point, and every point thereafter, you will be working with various
kinds of Reader, not InputStream.

The InputStreamReader is the place where the BYTES being read from the
FileInputStream are converted into CHARS that will be then move through the
various kinds of Reader. That conversion will interpret the bytes of the
InputStream as chars in the "MacRoman" char-set. This is where at least
half the data-mangling occurs.

In line 4, you connect a BufferedReader to the InputStreamReader. The data
that flows through this connection is entirely chars, because the
InputStreamReader has converted from bytes to chars by the time the
BufferedReader even sees the data.

Everything thereafter is operating on chars or sequences of chars
(Strings), and is NOT operating on the original UTF8-encoded bytes of the
file. If you want to duplicate the BYTES of the original file, which you
do, you must operate on the original bytes, without any translation of
bytes to chars anywhere along the way.


> // Read the file into a string and print to the screen
> String sText = "";
> do {
> sText += brFile.readLine() + "\n";
> } while (brFile.readLine() != null);
> System.out.println(sText);

This code is flawed in several ways.

First of all, you shouldn't even be using a Reader (of any kind) to read
the data as lines of chars. You should be using an InputStream to read the
data as arrays of bytes. In Java, these two things are completely
different. In other languages, such as C, they may be alike. I don't know
what your programming-language experience is, but if you're patterning this
Java code after something in C, it's wrong. Java is not C.

Second, you are losing half the lines you're reading, because you're
calling readLine() at two places in the loop. The readLine() method reads
AND RETURNS the next line of text. So if you read a line and just compare
it to null, without actually keeping the data, then your output will
consist only of every other line in the original. Specifically, it will
retain the even lines and lose the odd lines.

I have no idea how this could possibly work for anything but a file
containing exactly one line of text. I suggest trying your current program
on a file containing the numbers 0-10, one number per line, and examining
the copied output.

Third, you should not be copying a file by reading it entirely into memory
before writing it out. A big file will easily exhaust memory, and then
your program fails.

Fourth, copying text by appending it to a String in a loop is one of the
most inefficient ways you can possibly do this. The exact reasons have to
do with String representation, the meaning of "+" in a String context, and
the relationship between String and StringBuffer. Those reasons are not
worth explaining in detail right now, just trust me that you shouldn't copy
data this way in a loop.


> // Create output files, streams, and writers
> File f2 = new File(sDestinFile);
> FileOutputStream fos = new FileOutputStream(f2);
> PrintStream ps = new PrintStream(fos);
> ps.println(sText);
> ps.close();

Using a PrintStream here has very little hope of giving correct results
here, except by accident. You should be writing arrays of bytes to the
FileOutputStream, or to a BufferedOutputStream connected to the
FileOutputStream. Those bytes should be coming from a FileInputStream, or
a BufferedInputStream connected to it.

A PrintStream will perform platform-dependent char-to-byte translations.
On Mac OS X, that will be a Unicode to MacRoman translation.
Unfortunately, this won't preserve any Asian chars, because:
A) MacRoman can't represent Asian chars.
B) The original data was in UTF-8 anyway, not MacRoman.


>>Which Unix did you run it on?
>The one they use at U. Hawaii. No idea specifically what kind. I just ran
>their java compiler and it worked the same as on Mac for text files (but
>not rtf).

It worked because of a lucky accident of byte-to-char and char-to-byte
translations on that version of Unix. It didn't work because it was doing
things correctly.

Are you writing this program as homework for a class, or as a lesson in
some kind of guided tutorial?

If not, I strongly advise that you work through the Java tutorial,
especially the parts that describe the fundamental differences between the
In/OutputStreams and Reader/Writer classes:
<http://java.sun.com/docs/books/tutorial/>

If you're taking a Java programming class at U. Hawaii, I also suggest
asking your teacher about the differences. If you're not taking a Java
programming class, I suggest asking someone who is, or enrolling in such a
class yourself.


For the benefit of the Java-Dev list, the entire source of the problematic
program follows:

import java.io.*;
import java.util.*;
import java.text.*;

public class FileCopy {

public static void main (String args[]) throws Exception {
// Declare input and output filenames
String sSourceFile, sDestinFile;

// Enable input from the terminal
InputStreamReader isr = new InputStreamReader(System.in);
BufferedReader brTerminal = new BufferedReader(isr);

try {
// Get the names of source and destination files
System.out.print("Enter name of source file: ");
sSourceFile = brTerminal.readLine();
System.out.print("Enter name of destination file: ");
sDestinFile = brTerminal.readLine();

// Create input file, streams, and readers
File f1 = new File(sSourceFile);
FileInputStream fis = new FileInputStream(f1);
InputStreamReader isrFile = new InputStreamReader(fis);
BufferedReader brFile = new BufferedReader(isrFile);

// Read the file into a string and print to the screen
String sText = "";
do {
sText += brFile.readLine() + "\n";
} while (brFile.readLine() != null);
System.out.println(sText);

// Create output files, streams, and writers
File f2 = new File(sDestinFile);
FileOutputStream fos = new FileOutputStream(f2);
PrintStream ps = new PrintStream(fos);
ps.println(sText);
ps.close();
} // end try
catch (FileNotFoundException fnfe) {
System.out.println("The file was not found.");
}
}
}

-- GG
_______________________________________________
java-dev mailing list | email@hidden
Help/Unsubscribe/Archives: http://www.lists.apple.com/mailman/listinfo/java-dev
Do not post admin requests to the list. They will be ignored.




Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Contact Apple | Terms of Use | Privacy Policy

Copyright © 2007 Apple Inc. All rights reserved.