Skip to content

ildar-shaimordanov/perl-utils

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

perl-utils

Table of Content

Preamble

perl-utils is the set of text- and file-oriented utilities. Text-oriented scripts are supposed to be used mostly for processing paragraphs. By default, a paragraph is idenitified as a bunch of text lines delimited by an empty or blank lines.

Assuming the text file is the set of paragraphs, it is easier to sort, merge and filter some files without losing links between lines of paragraphs.

For example, multiline log entries in log files could contain additional useful information. Using grep -C (or grep -A, or grep -B) doesn't guarantee complete extraction of particular log entries (or can extract other log entries not necessary at the moment).

Paragraph processing utilities

paragrep

paragrep - grep-like filter for searching matches in paragraphs.

paragrep assumes the input consists of paragraphs and prints the paragraphs matching a pattern. Paragraph is identified as a block of text delimited by an empty or blank lines.

The initial version was very simple and was implemented as a shell function invoking perl inline script for grepping log files:

paragrep() {
	perl -ne '
	if ( m/$break_of_para/ ) {
		print $para if defined $para && $para =~ /$regexp/;
		$para = "";
	}
	$para .= $_;
	END {
		print $para if defined $para && $para =~ /$regexp/;
	}
	' -s -- -break_of_para="$1" -regexp="$2" "${@:3}"
}

or

paragrep() {
	perl -ne '
	( m/$break_of_para/ or eof ) and do {
		print $para if defined $para && $para =~ /$regexp/;
		$para = "";
	};
	$para .= $_;
	' -s -- -break_of_para="$1" -regexp="$2" "${@:3}"
}

Later I decided to implement it as the standalone script adding more functionality and flexibility.

Example

Each log entry in log files usually begins with the timestamp in the generalized numeric form date time, which can be covered by the pattern without reflecting on which date format has been used to output dates:

paragrep -Pp '^\d+[/-]\d+[/-]\d+ \d+:\d+:\d+' PATTERN FILENAME

Also the aliases for parsing log files and INI-like configuration files:

alias lgrep="paragrep -Pp '^\d+[/-]\d+[/-]\d+ \d+:\d+:\d+'"
alias cgrep="paragrep -Pp '^(#@ |#-> )?\['"

Similar tools

While working on the script I found a lot of interesting implementations of the task on different languages. Here is a quite short excerpt of them interested me:

logmerge

Small and powerful script to merge two or more logfiles so that multilined entries appear in the correct chronological order without breaks of log entries.

Other text-oriented utilities

sponge

sponge is Perl version of the sponge from the Debian package moreutils.

It reads standard input to memory and writes it out to the specified file. Unlike a shell redirect, the script soaks up all its input before opening the output file. This allows constructing pipelines that read from and write to the same file. If no file is specified, outputs to STDOUT.

My first release was the Perl inline script within the shell function:

sponge() {
	perl -ne '
	push @lines, $_;
	END {
		open(OUT, ">$file")
		or die "sponge: cannot open $file: $!\n";
		print OUT @lines;
		close(OUT);
	}
	' -s -- -file="$1"
}

Perl has many ways to do it. So, there is a bit another way also supporting the -a option for appending to the file:

sponge() {
	perl -e '
	$file = shift || "-";
	@lines = <>;
	open OUT, ( defined $a ? ">>" : ">" ) . $file
	or die "sponge: cannot open $file: $!\n";
	print OUT @lines;
	close OUT;
	' -s -- "$@"
}

Awk can do sponge as well:

#!/usr/bin/awk -f

# slurp a stuff and burp...
# ... | awk -f sponge.awk [-v ORS="\r\n"] [-v append=1] [-v file=file]

NR == 1	{ lines = $0 }
NR != 1	{ lines = lines ORS $0 }

END	{
	if ( ! file ) { file = "-" }
	if ( append ) {
		print lines >> file;
	} else {
		print lines >  file;
	}
}

or the same but more convenient in shell:

#!/bin/sh

# slurp a stuff and burp...
# ... | sponge [-a] file

sponge() (
	case "$1" in
	-a | --append )
		append=1
		file="$2"
		;;
	* )
		append=""
		file="$1"
		;;
	esac

	awk -v append="$append" -v file="$file" '
NR == 1	{ lines = $0 }
NR != 1	{ lines = lines ORS $0 }

END	{
	if ( ! file ) { file = "-" }
	if ( append ) {
		print lines >> file;
	} else {
		print lines >  file;
	}
}'
)

sponge "$@"

Example

An abstract example of usage is described in the tool's help and shown below:

sed '...' file | grep '...' | sponge [-a] file

See also

transpose

This is Perl implementation of the AWK script to transpose the input file so rows become columns and columns become rows.

#!/usr/bin/awk -f

{
	for (i = 1; i <= NF; i++) {
		a[NR,i] = $i
	}
}

NF > p {
	p = NF
}

END {
	for (j = 1; j <= p; j++) {
		str = a[1,j]
		for (i = 2; i <= NR; i++) {
			str = str OFS a[i,j];
		}
		print str
	}
}

Example

( echo {1..5} ; echo {100..104} ) | ./transpose

See also

File-oriented utilities

file-rename

file-rename renames the filenames supplied according to the rule specified as the first argument. It supports several ways to rename files: applying a perl code to copy or move files; rotating names cyclically left or right; swapping two names; flipping the whole list of files.

Example

file-rename 's/\.bak$//' *.bak

See Also

To be continued...