What's a good Perl regex to untaint an absolute path?

Well, I tried and failed so, here I am again.

I need to match my abs path pattern.


I am in taint mode etc..

#!/usr/bin/perl -Tw
use CGI::Carp qw(fatalsToBrowser);
use strict;
use warnings;
$ENV{PATH} = "bin:/usr/bin";

I need to open the same file a couple times or more and taint forces me to untaint the file name every time. Although I may be doing something else wrong, I still need help constructing this pattern for future reference.

my $file = "$var[5]";
if ($file =~ /(\w{1}[\w-\/]*)/) {
$under = "/$1\.cnt";
} else {

You can see by my beginner attempt that I am close to clueless.

I had to add the forward slash and extension to $1 due to my poorly constructed, but working, regex.

So, I need help learning how to fix my expression so $1 represents /public_html/mystuff/10000001/001/10/01.cnt

Could someone hold my hand here and show me how to make:

$file =~ /(\w{1}[\w-\/]*)/ match my absolute path /public_html/mystuff/10000001/001/10/01.cnt ?

Thanks for any assistance.

13.10.2009 19:40:47
By the way, $file = $var[5]; is enough; no need to quote $var[5]. See perldoc.perl.org/… Also, I am sure you realize, @var is a bad name.
Sinan Ünür 13.10.2009 20:12:39
Thank you. Yes, "var" was just off the cuff in my attempt to be as clear as possible in the question. I am still learning the double quote or not rule. I do understand single quotes -vs- double so, I am actually learning.
Jim_Bo 13.10.2009 20:22:48
To more directly answer the final question: /(\w{1}[\w-\/]*)/ will not match the forward slash or the extension, because you never allowed for those in the regex. You would need something more like /\/?(\w{1}[\w-\/\.]*)/. Notice that I added an optional leading forward slash, and I added the possibility for a period after the first word. The below answers are still better, of course - more specific is better when possible - but for the sake of learning, I think it is important to have this answer as well. :)
Rini 14.10.2009 15:06:42

Edit: Using $ in the pattern (as I did before) is not advisable here because it can match \n at the end of the filename. Use \z instead because it unambiguously matches the end of the string.

Be as specific as possible in what you are matching:

my $fn = '/public_html/mystuff/10000001/001/10/01.cnt';

if ( $fn =~ m!
     )\z!x ) {
     print $1, "\n";

Alternatively, you can reduce the vertical space taken by the code by putting the what I assume to be a common prefix '/public_html/mystuff' in a variable and combining various components in a qr// construct (see perldoc perlop) and then use the conditional operator ?::


use strict;
use warnings;

my $fn = '/public_html/mystuff/10000001/001/10/01.cnt';
my $prefix = '/public_html/mystuff';
my $re = qr!^($prefix/[0-9]{8}/[0-9]{3}/[0-9]{2}/[0-9]{2}\.cnt)\z!;

$fn = $fn =~ $re ? $1 : undef;

die "Filename did not match the requirements" unless defined $fn;
print $fn, "\n";

Also, I cannot reconcile using a relative path as you do in

$ENV{PATH} = "bin:/usr/bin";

with using taint mode. Did you mean

$ENV{PATH} = "/bin:/usr/bin";
13.10.2009 20:50:16
Thank you @Sinan, I was not aware that the $ENV{PATH} was not correct. Everything was working properly but, that may have been an issue in the future. Thanks. I was real close on that pattern too in one attempt! I left out the $!x on the end and I gave up in frustration, adapting the one in my question. Thanks again.
Jim_Bo 13.10.2009 19:59:06
@Jim_Bo: I thought this was less cluttered but I'll combine the two.
Sinan Ünür 13.10.2009 20:03:28
@Sinan, could you also put your original answer up there as well. Was a little more versose but, helpful in my learning curve. Thanks.
Jim_Bo 13.10.2009 20:04:54
The only suggestion I would have is to use File::Spec along with what you're already doing in this example.
genio 13.10.2009 20:20:16
@genio, I luv modules but, in this case I wanted to avoid using any modules and keep it as small as possible. I would even like to get rid of the CGI module if I can. I am unaware if I can. I will try though.
Jim_Bo 13.10.2009 20:27:06

You talk about untainting the file path every time. That's probably because you aren't compartmentalizing your program steps.

In general, I break up these sort of programs into stages. One of the earlier stages is data validation. Before I let the program continue, I validate all the data that I can. If any of it doesn't fit what I expect, I don't let the program continue. I don't want to get half-way through something important (like inserting stuff into a database) only to discover something is wrong.

So, when you get the data, untaint all of it and store the values in a new data structure. Don't use the original data or the CGI functions after that. The CGI module is just there to hand data to your program. After that, the rest of the program should know as little about CGI as possible.

I don't know what you are doing, but it's almost always a design smell to take actual filenames as input.

13.10.2009 22:15:44