Text Processing Language for a computaional physicist

In summary: AWK is more powerful than perl but can be difficult to read. Perl is more powerful than awk but is easier to read.
  • #1
Useful nucleus
370
58
So I have been using a combination of Linux shell and Fortran to process big output files from my simulations. But I realized that it would save me a lot of time and effort if I can learn a text processing language. I got different recommendations and it seems that AWK and perl are on the top of the list. Ideally I need something that I can learn fast but is also efficient.
Any tips would be appreciated.
 
Technology news on Phys.org
  • #2
I'd go with AWK. It's on every distro of Unix, and is available on Windows too.

It doesn't have all the special character classes of Perl but is perhaps easier to learn as it follows a C-like syntax.

Code:
#!/bin/awk -f

BEGIN {

}

/.../ {...}

...

END {

}

Program format is pretty straight forward with initializing begin block and finalizing end block and a bunch of matching rules.

If you have specific text format to parse I can help you.

One caveat is that when it fails it may not be clear about what line the error is on. I haven't run into that much though. I use awk a lot for finding files, creating txt based menu selections, scanning log4j output and for simplifying other arcane programmer tasks.
 
  • #3
Thank you, jedishrfu! I actually do not have something that I want to parse right now, but in the next few months I need to automate a lot of post-simulation analysis to save myself time. I will definitely get back to the forums if I have question. Thanks a lot for offering help!

Just one more question, do you recommenced certain book to learn AWK?
 
  • #5
Thank you very much! I think I can borrow a copy of the book from the library.
 
  • #6
I would suggest perl if you want something that's quick to use and powerful, python if you want something more like a real programming language (but which requires a little more effort to do stuff). I honestly didn't know people still use awk...
 
  • #7
Coin said:
I would suggest perl if you want something that's quick to use and powerful, python if you want something more like a real programming language (but which requires a little more effort to do stuff). I honestly didn't know people still use awk...

Both are good, but AWK is classic. Others to consider are ruby and groovy which provide OO features. Groovy is especially nice because of its close connection to Java, so close that you can copy java source into your script and with very few changes make it work as is.

the trouble with these is that they sometimes aren't installed on the *nux distro that you're using but AWK is alway present. Awk is also used a lot as one-liners in shell scripts to get things done that the script language just can't.

Biologists use Perl, Python and sometimes Ruby a lot for data manipulation.

In my own work I use AWK for developer tools and Groovy for java source code parsing when I need to migrate large amounts of code and need to play with the java syntax a lot or work with XML files. The regular expressions used in AWK are the same as in many other languages so you can't go wrong learning it first with AWK.

For me the best feature of AWK was its closeness to C syntax, ability to run other commands and parse the output and the fact that it had associative arrays which I used for a lot of table lookup tasks. groovy has many of the same features with associative arrays being replaced with properties objects.
 
Last edited:
  • #8
Most modern programming languages have regular expressions that are quite powerful. So unless you are having severe performance issues, use a "good" language that you already know, and just learn the regular expressions.

Perl and awk are not my favorites...people in here will probably scream, but they can be rather difficult to read. I won't argue with their power, but many of the the same capabilities will now exist in, say, Java.
 

Related to Text Processing Language for a computaional physicist

1. What is Text Processing Language (TPL)?

Text Processing Language (TPL) is a type of programming language specifically designed for computational physics. It allows scientists to manipulate and analyze large amounts of text data in a efficient and systematic manner.

2. How is TPL different from other programming languages?

TPL is specifically tailored for text processing tasks in computational physics, whereas other programming languages may have a wider range of applications. TPL also typically has specialized features and functions for handling scientific data and equations.

3. Can TPL be used by non-physicists?

While TPL was created with computational physicists in mind, anyone with a background in programming can learn and use TPL. It may require some additional knowledge of physics concepts, but the language itself is accessible to non-physicists.

4. What are some common tasks that can be performed using TPL?

TPL can be used for a variety of text processing tasks, such as data cleaning, data extraction, data analysis, and data visualization. It is particularly useful for handling large datasets and performing complex calculations on them.

5. Is TPL constantly evolving?

Like any programming language, TPL is constantly evolving as new technologies and techniques are developed. It is important for scientists using TPL to stay up-to-date with the latest updates and developments in order to fully utilize its capabilities.

Similar threads

  • Computing and Technology
Replies
3
Views
2K
  • Programming and Computer Science
Replies
2
Views
3K
Replies
2
Views
154
  • Science and Math Textbooks
2
Replies
40
Views
8K
  • Science and Math Textbooks
2
Replies
45
Views
5K
Replies
9
Views
3K
Replies
2
Views
983
  • Engineering and Comp Sci Homework Help
Replies
1
Views
3K
Replies
10
Views
2K
Back
Top