• Open Menu Close Menu
  • Apple
  • Shopping Bag
  • Apple
  • Mac
  • iPad
  • iPhone
  • Watch
  • TV
  • Music
  • Support
  • Search apple.com
  • Shopping Bag
 

Lists

Open Menu Close Menu
  • Terms and Conditions
  • Lists hosted on this site
  • Email the Postmaster
  • Tips for posting to public mailing lists
Re: Best tool for large (500MB) text manipulation
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Best tool for large (500MB) text manipulation


  • Subject: Re: Best tool for large (500MB) text manipulation
  • From: Jan Steinman <email@hidden>
  • Date: Sun, 17 Oct 2004 08:33:29 -0700

From: Bruce Robertson <email@hidden>

I have a relative who deals with very large data files, for example 500MB
text files. He needs to find, extract, and manipulate data from them. He has
been using BBEdit but is coming up against its limits. The application
happens to be analysis of stock trading data.

What sort of 'limits' is he hitting?

What sort of capabilities of BBEdit is he using?

What sort of CS knowledge does he have? (How much is he willing to learn? :-)

Is this data long-lived, or transient?

There are TONS of non-GUI data manipulation tools in UNIX. I suspect that, with half a gig of data, he's simply running into memory problems. Adding RAM to his system may help in the short term.

But a long-term fix is going to involve NOT loading the entire file into memory at once, which is what happens in BBEdit.

If he's willing to lean some geek-speak, he should probably look at standard UNIX tools, like awk, sed, and grep. By learning just a bit about Regular Expressions, he'll have a MUCH better time of "finding, extracting, and manipulating data" with those tools. A more ambitious (and more powerful) approach would be learning perl.

The key to these UNIX programs is that they operate on streams of data, and can work well in a limited memory space. And there may well be some ready-made solutions that are similar to his problem space that are available under GPL or public domain. But they will require some considerable commitment in learning time.

Yet another approach (also requiring more or less investment in learning) would be a real database. A database would work better with long-lived data -- "write once, read many" data. FileMaker Pro is fairly easy to learn, but has serious shortcomings for the long term. MySQL is free, but will require more learning time.

:::: If addiction is judged by how long a dumb animal will sit pressing a lever to get a "fix" of something, to its own detriment, then I would conclude that the Internet is far more addictive than cocaine. -- Rob Stampfli
:::: Jan Steinman <http://www.Bytesmiths.com/Image/98-4880-34>


_______________________________________________
Do not post admin requests to the list. They will be ignored.
Applescript-users mailing list      (email@hidden)
Help/Unsubscribe/Update your Subscription:
This email sent to email@hidden


  • Follow-Ups:
    • Re: Best tool for large (500MB) text manipulation
      • From: Mikael Byström <email@hidden>
    • Re: Best tool for large (500MB) text manipulation
      • From: Bruce Robertson <email@hidden>
  • Prev by Date: Re: CAD software for circuit simulation
  • Next by Date: Re: CAD software for circuit simulation
  • Previous by thread: Fwd: Best tool for large (500MB) text manipulation
  • Next by thread: Re: Best tool for large (500MB) text manipulation
  • Index(es):
    • Date
    • Thread