Bit of help needed in some bash/shell script please
Discussion
I need to chop up a string in bash/shell and put the substrings into variables for use later on.
Would someone mind helping out with the syntax a little for me.
The string has a header on it and then the parameters follow on. The header describes the start position and length of each paramter.
It's all basic stuff but my shell script knowledge is pants.
Here's the string with header
This is the header - encolsed in []
[0:7,7:1,8:1,9:4,13:21,34:186,220:38]
There are always seven paramters defined.
The Header declares the start and length of each paramters in the string in an xx:yy pair that are delimited by commas.
The start of the string is 0 indexed and beings after the ']'
So
p1 starts at 0 has length 7 and is add_rcs
p2 starts at 7 has length 1 and is 1
p3 starts at 8 has length 1 and is 1
p4 starts at 9 has length 4 and is eth0
etc.
Please would someone give me a bit of help coding this in bash/shell?
Even just to get the start and length values would be a help.
Many thanks
Ade
Would someone mind helping out with the syntax a little for me.
The string has a header on it and then the parameters follow on. The header describes the start position and length of each paramter.
It's all basic stuff but my shell script knowledge is pants.
Here's the string with header
[0:7,7:1,8:1,9:4,13:21,34:186,220:38]add_rcs11eth0port 3330 and not tcpframe.time,frame.number,ip.src,ip.dst,tcp.srcport,tcp.dstport,udp.srcport,udp.dstport,ip.proto,frame.time,ip.src,ip.dst,tcp.srcport,tcp.dstport,udp.srcport,udp.dstport,ip.proto,bgan,cncpbgan.imsi == 12 || cncp.opcode == 1234
This is the header - encolsed in []
[0:7,7:1,8:1,9:4,13:21,34:186,220:38]
There are always seven paramters defined.
The Header declares the start and length of each paramters in the string in an xx:yy pair that are delimited by commas.
The start of the string is 0 indexed and beings after the ']'
So
p1 starts at 0 has length 7 and is add_rcs
p2 starts at 7 has length 1 and is 1
p3 starts at 8 has length 1 and is 1
p4 starts at 9 has length 4 and is eth0
etc.
Please would someone give me a bit of help coding this in bash/shell?
Even just to get the start and length values would be a help.
Many thanks
Ade
Are you limited to shell scripting for this? Something like Ruby with its String, Array and RegExp classes would be just the job for this. Perl's probably got similar capabilities.
Some Unix greybeard may be able to cook up a bunch of sed and awk commands with regular expressions to cut the string up, but it'll be inpenetrable and hard to debug. I'd go with a scripting language with rich string / array support to do this job, it'd be much easier.
Some Unix greybeard may be able to cook up a bunch of sed and awk commands with regular expressions to cut the string up, but it'll be inpenetrable and hard to debug. I'd go with a scripting language with rich string / array support to do this job, it'd be much easier.
cyberface said:
Are you limited to shell scripting for this? Something like Ruby with its String, Array and RegExp classes would be just the job for this. Perl's probably got similar capabilities.
Some Unix greybeard may be able to cook up a bunch of sed and awk commands with regular expressions to cut the string up, but it'll be inpenetrable and hard to debug. I'd go with a scripting language with rich string / array support to do this job, it'd be much easier.
sed & awk are bog-standard ways to deal with these issues, ruby and to a lesser extent perl would make things more complicated, not less.Some Unix greybeard may be able to cook up a bunch of sed and awk commands with regular expressions to cut the string up, but it'll be inpenetrable and hard to debug. I'd go with a scripting language with rich string / array support to do this job, it'd be much easier.
That sed (badum-tish) I haven't got a solution for the OP, but it should be pretty simple.
I'd never do anything like this any other way than in perl. I'm also a great believer in the following aphorism; "You have a problem. You decide to solve it with regular expressions. Now you have two problems."
In that spirit, here's a quick and dirty perl program to chop the data up (Lord knows what Pistonheads is going to do to the layout and brackets.);

{
chomp;
# Split the header and data in two by the ']'
($header,$data) = split(/\]/,$_);
# Strip off the leading '['
$header = substr ($header,1);
# Split the header fields up by the ','. The @fields array now
# contains 7 elements, each with the start & length of a data field
@fields = split (/,/,$header);
$i = 0;
# For each field specifier
foreach (@fields)
{
($start,$length) = split (/:/,$_);
@field_data[$i++] = substr ($data,$start,$length);
}
# The @field_data array now contains the individual fields
# Print it out ...
$i = 0;
foreach (@field_data)
{
print $i++,":\t",$_,"\n";
}
}
In that spirit, here's a quick and dirty perl program to chop the data up (Lord knows what Pistonheads is going to do to the layout and brackets.);
- !/usr/bin/perl

{
chomp;
# Split the header and data in two by the ']'
($header,$data) = split(/\]/,$_);
# Strip off the leading '['
$header = substr ($header,1);
# Split the header fields up by the ','. The @fields array now
# contains 7 elements, each with the start & length of a data field
@fields = split (/,/,$header);
$i = 0;
# For each field specifier
foreach (@fields)
{
($start,$length) = split (/:/,$_);
@field_data[$i++] = substr ($data,$start,$length);
}
# The @field_data array now contains the individual fields
# Print it out ...
$i = 0;
foreach (@field_data)
{
print $i++,":\t",$_,"\n";
}
}
zaktoo said:
cyberface said:
Are you limited to shell scripting for this? Something like Ruby with its String, Array and RegExp classes would be just the job for this. Perl's probably got similar capabilities.
Some Unix greybeard may be able to cook up a bunch of sed and awk commands with regular expressions to cut the string up, but it'll be inpenetrable and hard to debug. I'd go with a scripting language with rich string / array support to do this job, it'd be much easier.
sed & awk are bog-standard ways to deal with these issues, ruby and to a lesser extent perl would make things more complicated, not less.Some Unix greybeard may be able to cook up a bunch of sed and awk commands with regular expressions to cut the string up, but it'll be inpenetrable and hard to debug. I'd go with a scripting language with rich string / array support to do this job, it'd be much easier.
That sed (badum-tish) I haven't got a solution for the OP, but it should be pretty simple.
I don't have the time but I could write a similar process to Zumbruk's perl script in Ruby that'd be more concise and elegant, but I assume he could too, and I'm not getting into a coding dick-waving contest.

cyberface said:
I don't have the time but I could write a similar process to Zumbruk's perl script in Ruby that'd be more concise and elegant, but I assume he could too, and I'm not getting into a coding dick-waving contest.
Quite so, in all respects.
Edited by Zumbruk on Saturday 13th October 17:08
Uninspired bash kludge...
{{{
LINE=`echo "$LINE" | sed -e 's/^.*\]//'`
# Extract first set of parameters, using anything other than
# 0-9 or : as delimiter
P=`echo "$HDR" | sed -e 's/[^0-9:].*$//'`
# Chop that parameter set and its following delimiter
# off the start of the header
HDR=`echo "$HDR" | sed -e "s/$P//" -e 's/^[^0-9:]//'`
# Split the parameter set into START and LENGTH,
# using : as delimiter
START=`echo "$P" | sed -e 's/:.*$//'`
LENGTH=`echo "$P" | sed -e 's/^.*://'`
# Use START and LENGTH to extract THING from the input line
THING=`echo "$LINE" | sed -r -e "s/(^.{$START})(.{$LENGTH})(.*$)/\2/"`
# Output THING
echo "$THING"
done
}}}
gives
looks right to me...
{{{
- !/bin/bash
- Sample input line to extract THINGs from
- Split line into HEADER and The Rest, using ] as delimiter
LINE=`echo "$LINE" | sed -e 's/^.*\]//'`
- Iterate through the header
# Extract first set of parameters, using anything other than
# 0-9 or : as delimiter
P=`echo "$HDR" | sed -e 's/[^0-9:].*$//'`
# Chop that parameter set and its following delimiter
# off the start of the header
HDR=`echo "$HDR" | sed -e "s/$P//" -e 's/^[^0-9:]//'`
# Split the parameter set into START and LENGTH,
# using : as delimiter
START=`echo "$P" | sed -e 's/:.*$//'`
LENGTH=`echo "$P" | sed -e 's/^.*://'`
# Use START and LENGTH to extract THING from the input line
THING=`echo "$LINE" | sed -r -e "s/(^.{$START})(.{$LENGTH})(.*$)/\2/"`
# Output THING
echo "$THING"
done
}}}
gives
$ ./ex
add_rcs
1
1
eth0
port 3330 and not tcp
frame.time,frame.number,ip.src,ip.dst,tcp.srcport,tcp.dstport,udp.srcport,udp.dstport,ip.proto,frame.time,ip.src,ip.dst,tcp.srcport,tcp.dstport,udp.srcport,udp.dstport,ip.proto,bgan,cncp
bgan.imsi == 12 || cncp.opcode == 1234
$
looks right to me...
GreenV8S said:
Zumbruk said:
"You have a problem. You decide to solve it with regular expressions. Now you have two problems."...
Zumbruk said:
split(/\]/,$_);

Edited by Zumbruk on Saturday 13th October 22:09
Zumbruk said:
Blimey, that's trivial - I started with one that pattern matched the whole of the header, but I wanted to keep it simple for the non-perl types.
Of course it is, I just found it amusing how you warned us against using RegExp and then used it yourself. 
Edited by GreenV8S on Saturday 13th October 22:18
mystomachehurts said:
GreenV8S said:
For what it's worth, I think it should be possible to solve the problem using four similarly trivial regular expressions.
Can we do this in bash? 
Tomorrow I will be mostly playing with Pigeon's script
Zumbruk's example wasn't even a proper complex regular expression - the slashes are merely there as escape characters. If you think Pidge's example is less complex, then run with it, but if those big sed commands don't immediately make sense to you, then I'd seriously advise using perl or ruby.
If I have time tomorrow then I'll have a go at a ruby script without any regexps or escape characters to make it nice and simple...
Pigeon said:
cyberface said:
those big sed commands
Those aren't big sed commands. Those are weeny little sed commands 
You can regexp the header in perl with something like (off the top of my head, so likely wrong);
/\[(d+)\:(d+)\,(d+)\:(d+)\,(d+)\:(d+)\,(d+)\:(d+)\,(d+)\:(d+)\,(d+)\:(d+)\,(d+)\:(d+)\]/;
And the parameters are then available in the metavariables $1,$2,$3 ... $14, but that's hideous. And I couldn't think of a way to iterate over the metavariables. Much easier to use the "split" function and bung them in an array, like wot I did.
(Oh, poo. Is there a way to do literal quoting in PH without it turning all your parentheses into smiley faces???)
Edited by Zumbruk on Sunday 14th October 11:14
Edited by Zumbruk on Sunday 14th October 11:50
Edited by PetrolTed on Monday 15th October 09:56
Gassing Station | Computers, Gadgets & Stuff | Top of Page | What's New | My Stuff











