Updates from August, 2011 Toggle Comment Threads | Keyboard Shortcuts

  • danw 10:54 am on 2011-08-16 Permalink | Reply
    Tags: perl,   

    sarpipe.pl – machine readable solaris sar output 

    I’ve been scripting somewhat with sar on Solaris 10. The major problem is that for some reason there is no flag to output machine readable output (ie. make it easy to import into spreadsheets or other script). The most imporant part was adding a time field to each and every block device to make it much easier to create disk statistics.

    So I threw one together sarpipe.pl:

    #!/usr/bin/perl -w
    use strict;
    
    #default field delimiter/separator
    my $delim = "|";
    
    while($_ = shift @ARGV) {
            if($_ =~ m/--delim/) {
                    #change the default field delimter
                    $delim = shift(@ARGV);
            }
            else {
                    die "Usage: sar [-A...] | $0 [--delim seperator]n";
            }
    }
    
    #preset so we don't get any concat empty val errors
    my $latesttime = "";
    #loop through the sar output
    while(<>) {
            chomp;
            #catch time field of output, remove from line
            if($_ =~ s/^(dd[:]dd[:]dd|Average)//) {
                    $latesttime = $1 . $delim;
            }
            #remove leading and tailing whitespace
            $_ =~ s/(^s+|s+$)//;
            #replace spaces with field delimiter
            $_ =~ s/s+/$delim/g;
            #if the line contains any content, print time field and line
            print $latesttime  . $_ if($_ =~ m/^.+$/);
            print "n";
    }
    

    In use:

    user@example$ ./sarpipe.pl  -h
    Usage: sar [-A...] | ./sarpipe.pl [--delim seperator]
    user@example$ sar -d | ./sarpipe.pl | more
    
    SunOS|bcaeao|5.10|Generic_144488-10|sun4u|08/16/2011
    
    00:00:00|device|%busy|avque|r+w/s|blks/s|avwait|avserv
    
    00:10:01|md110|0|0.0|0|1|0.0|29.9
    00:10:01|md111|0|0.0|0|0|0.0|0.0
    00:10:01|md115|1|0.0|2|30|0.0|15.3
    00:10:01|md116|0|0.0|0|0|0.0|12.8
    00:10:01|md120|0|0.0|0|1|0.0|27.4
    00:10:01|md121|0|0.0|0|0|0.0|0.0
    00:10:01|md125|1|0.0|2|30|0.0|13.4
    00:10:01|md126|0|0.0|0|0|0.0|13.0
    00:10:01|md130|0|0.0|0|1|0.0|0.0
    ...
    Average|ssd35,c|0|0.0|0|0|0.0|0.0
    Average|ssd35,g|1|0.0|2|179|0.0|10.1
    Average|ssd36|0|0.0|0|0|0.0|0.0
    Average|ssd36,a|0|0.0|0|0|0.0|0.0
    Average|ssd36,b|0|0.0|0|0|0.0|0.0
    Average|ssd36,c|0|0.0|0|0|0.0|0.0
    Average|ssd36,f|0|0.0|0|0|0.0|0.0
    Average|ssd36,g|0|0.0|0|0|0.0|0.0
    Average|ssd36,h|0|0.0|0|0|0.0|0.0
    Average|ssd38|0|0.0|1|5|0.0|2.2
    Average|ssd38,c|0|0.0|0|0|0.0|0.0
    Average|ssd38,g|0|0.0|1|5|0.0|2.2
    user@example$ 
    

    As you can see, it just formats the sar output to be easily used. It does not remove empty lines or remove annoying psudo block devices (ssd36,h). The default delimiter is a pipe because a comma would interfere with device names with a comma. If you change the delimiter via command line (–delim) be sure to be aware of shell escapes (eg –delim ! not –delim !).

     
  • danw 11:37 am on 2011-07-13 Permalink | Reply
    Tags:   

    Solaris is cool and stuff 

    While research a problem with backing up Solaris zones I stumbled across a question about Solaris on Serverfault.com.

    And since all the answers didn’t really seem all that good I decided to drop some knowledge:

    This question is funny; this is almost the perfect question for a shill to ask to highlight Solaris 10 new features but no one gave the pro-Solaris answer.

    This is a textbook application of Solaris Zones. The shared kernel provided by Zones lowers the overhead of virtualization, and increases speed dramatically. If you have an idea of a standard install for VPS (bash, apache2, php5, python 2.X, …) you can create a single “gold” zone to use as a template to clone to new zones. Package repositories are available at sunfreeware and blastwave providing you with pre-compiled packages, removing the need to compile your own if you don’t want to.

    You can create your template, charge $X per VPS and clone the template for each new customer, total config time upwards of 5min, 0min if you script/automate it. Upgrading the “global” zone (the base system) will cascade those upgrades into the zones, or you can upgrade per zone, also highly automatable.

    Solaris has kernel space accelerated SSL encryption for supported hardware: expensive cards, Sun/Oracle Niagra2 CPU based systems, and the new Nahalem systems with AES acceleration, which greatly increases the number of SSL protected websites you can host per system (a href=”http://www.c0t0d0s0.org/archives/6926-Performance-Impact-of-kssl.html&#8221; target=”_blank”>link).

    Solaris 10 has many new features in resource management allowing you to segregate individual zones/processes/groups/users and keep runaway or compromised applications in one zone/group/user from impacting any others, as well as all the normal POSIX resource controls on memory use, file descriptors, etc.

    Solaris 10 Zones (and Solaris 10 in general) was designed from the ground up to prove excellent security, accountability, resource management, and to dovetail nicely with Sun (and now Oracle) hardware offerings. When released the Sun T5240 + Sun Solaris + Solaris Zones package was the best platform for page views per second for the money.

    In terms of technical merits, Solaris Zones is probably the best VPS solution available. But as is usually the case the issue is requirements and costs. Licensing, support costs, and Niagra2 or newer CPU hardware costs are rising with the Oracle takeover.

    So evaluate the following: Will the higher VPS density, better VPS isolation and wiz-bang features compensate for higher licensing costs (if using Oracle Solaris), smaller user base to draw peer support from, higher hardware costs (for SSL accel), cost of supporting yet-another-OS, cost of hiring people to support yet-another-OS, the longer time it takes for security patches to get released.

    If you already have a windows team, do you really want to hire a Solaris team just to shave a few percent off of your hardware bill? Stick with Hyper-V until it’ll save you money to switch. If you already have a large deployment of Solaris systems then go with Solaris. If you have a large Linux skill pool to draw on, do a Solaris trial and see how much extra time it takes 3 admins to learn the differences and maintain a new environment for 6 months

    But technology should almost never dictate your business decision process. Much as I hate to say it for most service providers it makes more sense to provide a Windows based VPS system than a Solaris one. Unless you know now that you’re going to need the feature set, and the advantages are going to save you lots of Time And Money(TM) you probably don’t want Solaris.

    But if this isn’t for a business and more about having fun, then go ahead, use Solaris! It’s alot of fun, has tons of features and options that you’ve never even thought of if you’re coming from a non-commercial Unix background. The deeper you get in to Solaris the more you learn about smart engineering and new ways of solving technical problems. I’ve yet to see a Linux box with a “load average: 1000.0+, 1000.0+, 1000.0+” that was responsive and easy to recover.

    @symcbean: I know Solaris (or Slow-laris as it is sometimes called) has a reputation for poor performance (eg your fork example) but I seem to recall that the “Solaris Internals” book said that they re-engineered the threads significantly for Solaris 10, and process creation/forking performance was among the industry leaders. The LWP framework where each thread in an app is mapped to it’s own light weight process in kernel space apparently gave a big boost to performance, reliability, and accounting. The big hurdles for Solaris aren’t so much technical as operational (bad ui), cultural (small user base), and political (Oracle).

    Link to original

     
  • danw 5:10 pm on 2010-06-01 Permalink | Reply
    Tags: solaris liveupgrade   

    Solaris Live Upgrade: NOOOOOOOOO! 

    I’ve been having a lot of fun with Solaris Live Upgrade at work lately. I’ve discovered a few interesting things that I thought I should share.

    Live Upgrade can down your server if you’re not carefull

    I don’t know why, but I’ve managed to down my server twice in the last week trying to create live upgrade boot environments. One zone lost the ability to see any mounted directories there by scaring the crap outta the DBA and requiring a zone reboot to fix, another abbandonned a cpio process copying data to the root file system. While it didn’t cause a crash, it could have broken some processes and SMF requires free space in /etc to operate correctly (aka save crashed processes)

    Live Upgrade lucreate doesn’t fail cleanly

    If live upgrades lucreate fails for any reason, it is very hard to recover. You can’t unconfigure the new boot environment, you cant delete the new boot environment, you can only complete the new boot environment, and that doesn’t work if say there isn’t enough physical space, or another hardware problem emerges

    Live Upgrade ludelete doesn’t work most of the time

    If you accidentaly destroy the metadevices or zfs file systems that live upgrade expects to exist in a boot environment, you cannot delete it. If the boot environment is “in complete” you can’t delete it, you preaty much can’t do anything with ludelete except remove pristine live upgrade environments. AKA only about 10% of the boot environments you wanted to delete.

    Live Upgrade is iffy at best

    So far this week I’ve had live upgrade refuse to patch zones because a single temp file didn’t copy correctly during boot creation. I’ve had lucreate mangle zone names then complain that the mangled name doesn’t exist. If you have a zone that mounts a file system, you have to include it in an exclude list file, or live upgrade will try to copy the contents of that additional file system onto your zones root drive.

    How I Live upgrade

    1. tar up /etc/lu*
    2. ls -al /tmp for each zone including global-zone
    3. create an exclude file listing all that you dont want on the zone’s root filesystem
    4. create a new live upgrade boot environment
    5. luupgrade either to a new version of solaris or to install patches
    6. luactive your new boot environment
    7. reboot as instructed via init 6
    8. confirm that the correct disk/filesystem is booting
    9. delete /etc/lu*
    10. restore /etc/lu* from the tar file
    11. If at any time something doesn’t work right, blow away all of the live upgrade configuration and restore with the tar file. Also remove any additional files from zones under /tmp that may be associated with live upgrade

    This is probably quite bad advice but I find live upgrade only seems to work when it is the first time you are using it. All subsequent times you get stuck with missing file systems that you removed since the last time you upgraded, weird file access errors, zone misnames, file systems filled to 100% with data you didn’t want copied, left of processes changing things you probably didn’t want it to.

     
  • danw 10:42 pm on 2009-11-01 Permalink | Reply
    Tags: networking   

    dhcpdump and patch 

    I was at work the other day and the whole network went down. DHCP and upstream routers went out and the network went kinda nuts. I tried dhcpdump to get a look at the traffic on the local network segment.

    Sadly it doesn’t work with pcap files (from tcpdump or wireshark), it only works with live network interfaces and must be run as rroot. So I decided to add pcap file to dhcpdump 1.8. I tried contacting the author but didn’t hear back, so I decided to put it up here.

    pcapfile.patch

    • add option to read from pcap file
    • change to read time from pcap packet header
    --- dhcpdump-1.8/dhcpdump.c     2008-06-23 20:26:52.000000000 -0700
    +++ dhcpdump.c  2009-09-30 15:22:10.000000000 -0700
    @@ -71,7 +71,7 @@
     void   printHexString(u_char *data, int len);
    
     void usage() {
    -       printf("Usage: $0 <-i interface> [-h macaddress]n");
    +       printf("Usage: $0 <-i interface|-f pcapfile> [-h macaddress]n");
            exit(0);
     }
    
    @@ -80,6 +80,7 @@
            pcap_t *cap;
            struct bpf_program fp;
            char    *interface = NULL;
    +       char    *pcap_file = NULL;
    
            for (i = 1; i < argc; i++) {
                    if (argv[i] == NULL || argv[i][0] != '-') break;
    @@ -90,6 +91,9 @@
                    case 'i':
                            interface = argv[++i];
                            break;
    +               case 'f':
    +                       pcap_file = argv[++i];
    +                       break;
                    default:
                            fprintf(stderr, "%s: %c: uknown optionn",
                                argv[0], argv[i][1]);
    @@ -97,13 +101,24 @@
                    }
            }
    
    -       if (interface == NULL) usage();
    +       if (    //no interface or pcap file specified
    +               ((interface == NULL) && (pcap_file == NULL)) ||
    +               //both an interface and a pcap file specified
    +               ((interface != NULL) && (pcap_file != NULL))
    +               )
    +               usage();
    
            if (hmask)
                    regcomp(&preg, hmask, REG_EXTENDED | REG_ICASE | REG_NOSUB);
    
    -       if ((cap = pcap_open_live(interface, 1500, 1, 100, errbuf)) == NULL)
    -               errx(1, "pcap_open_live(): %s", errbuf);
    +       if (interface != NULL) {
    +               if ((cap = pcap_open_live(interface, 1500, 1, 100, errbuf)) == NULL)
    +                       errx(1, "pcap_open_live(): %s", errbuf);
    +       }
    +       else {
    +               if ((cap = pcap_open_offline(pcap_file,errbuf)) == NULL)
    +                       errx(1, "pcap_open_offline(): %s", errbuf);
    +       }
            if (pcap_compile(cap, &fp, "udp and (port bootpc or port bootps)", 0, 0) < 0)
                    errx(1,"pcap_compile: %s", pcap_geterr(cap));
            if (pcap_setfilter(cap, &fp) < 0)
    @@ -148,12 +163,11 @@
            offset += sizeof(struct udphdr);
    
            {
    -               struct timeval tp;
    -               gettimeofday(&tp, NULL);
    +               //get time from pcap to enable reading from capture file
                    strftime(timestamp, sizeof(timestamp), "%Y-%m-%d %H:%M:%S.",
    -                   localtime(&(tp.tv_sec)));
    +                   localtime(&(h->ts.tv_sec)));
                    sprintf(timestamp + strlen(timestamp), "%03ld",
    -                   tp.tv_usec / 1000);
    +                   h->ts.tv_usec / 1000);
            }
    
            strcpy(mac_origin, ether_ntoa((struct ether_addr *)eh->ether_shost));
    
     
  • danw 12:22 pm on 2009-11-01 Permalink | Reply
    Tags:   

    Patching 

    Okay, so I’m stuck at work trying to patch some computers on sunday morning so I can get live upgrade working (see here) to shorten the outage for an impending upgrade/patch cycle. Boo. Then I manage to break said computer attempting a roll back. Double boo.

    Now i’m doing the ghettoest of the ghetto restore techniques. Netbackup? No, Netapp’s magical snap client? No.
    Tar baby!


    (cd /mnt/netapp/mount/point/root && tar cEf - *) | (cd / && tar xpf -)

    Hint: Gotta have the star ’cause a dot would include netapp specific hidden directories

    or even better

    ssh otherserver -l root "cd /mnt/netapp/mount/point/root && tar cEf - *" | (cd / && tar xpf -)

    ’cause I can’t use rsh (which would be much much faster [no encryption]). Yeah, the speed of ssh.

    The moral of the story? Don’t try so hard. Don’t patch a machine with the specific minimal patches you need to you can use live upgrade to make patching and upgrading faster. Do a full 10_Recommended(2.6GB of patches [as of Jun 2009]), then any remaining patches, then get live upgrade working. So what if it takes an extra 24 hours of outage time. At least you know it’s going to work.

     
  • danw 5:18 pm on 2006-10-31 Permalink | Reply  

    note to self 

    When updating the linux kernel to a better version, also update things that consider the kernel realy realy important.
    -dan

     
    • adamk 1:34 am on 2006-12-16 Permalink | Reply

      of course it’s just a rarely held thought that the kernal api should stay unchangd in anything short of a major vesion change… 🙂

c
compose new post
j
next post/next comment
k
previous post/previous comment
r
reply
e
edit
o
show/hide comments
t
go to top
l
go to login
h
show/hide help
shift + esc
cancel