Home All Groups Group Topic Archive Search About

Windows Services works for years/months/weeks, then chokes

Author
15 Feb 2006 4:09 PM
Trevor
This is driving us mad - please help!

Back in 2003, I coded a Windows Service in VB.NET for framework v1.1.4322.
I deployed it in Nov. 2003, and it worked fine until the end of May 2005,
when it choked (see below).  We restarted it, and it worked fine for another
7 months until it choked again at the end of Dec. 2005.  It has now failed
again (mid Feb. 2006).  So it's failing more frequently now?

OnStart, the service reads a number of settings from the configuration file.
Among thsoe settings there is:
- a path for a FileSystemWatcher
- a time when the file is expected to arrive, used to set a timer

FileSystemWatcher is set to watch the specified path for the creation
(arrival) of a *.CSV file from another system.  When it arrives, it's
contents are read and loaded into a database.  In return, other data is
collected from the database and put in an output file (*.OUT).  Then, a
global variable called LastExecuted is set to Now().  Declaration is at the
top of the code:  Private LastExecuted As Date.

If the time value is 02:00, then we expect file arrival at 2:00AM.
Therefore, the timer is set at the proper millissecond interval between now
and the next 3:00AM that comes around (whether that's today or tomorrow,
depending on the current time).  On Timer_Elapsed, I check that LastExecuted
is within the last 75 minutes.  If not, the file failed to arrive today, so
make an entry in the EventLog and SmtpMail.Send an error message.  Then,
re-read the configuration file (in case the expected arrival time was
changed) and then reset and restart the Timer (which typically results in a
24-hour period).

When the process chokes, this is what happens:
- the FileSystemWatcher worked as expected that day.
- the timer interval suddenly changes from 24 hours to less than a second.
Therefore, 75 minutes after the file was processed (75 minutes after
LastExecuted) several EventLog entries and e-mails are generated EVERY
SECOND.
- the process fails to update itself from the configuration file, even
though the call to that function is the next line of code to execute after
the Smtp.Send

Result:  By 8AM, when people arrive for work, I get a call saying that the
server has sent them 5000+ error e-mails, and to please make it stop.  I
reboot the service, and everything is fine.  I even have a debug mode which
tells me what the timer calculates for its intervals and what that
transaltes into.  The values are always correct.  Until it's been running
for a while.  Then the thing chokes.

Did something change in the framework around May 2005?  Am I not doing some
necessary memory cleanup (I have Dim MyLog As New EventLog in every
function/sub - do I need to release that)?  The Timer and LastExecuted ar
global variable used only once, right - there's nothing to clean up there,
is there?  I don't know if this is relevant, but there is another Timer in
my code - one that starts when the FileSystemWatcher is triggered, waits 20
seconds until FTP is done transferring the file, then stops.  I can't image
that interfereing - or is it?  There's nothing wrong with my interval
calculation, is there?

    Private Function ProperTimerInterval() As Double
        Dim MyLog As New EventLog
        MyLog.Source = "MyCompany"

        Dim NextCheckTime As Date
        If CInt(Time.Text.Substring(0, 2)) < CInt(Date.Now.Hour) Then
            NextCheckTime = Date.Parse(Date.Now.AddDays(1).Date & " " &
Time.Text)
        Else
            NextCheckTime = Date.Parse(Date.Now.Date & " " & Time.Text)
        End If
        NextCheckTime = NextCheckTime.AddHours(1)

        Dim IntervalToReturn As Double =
NextCheckTime.Subtract(Date.Now).TotalMilliseconds

        If Debug.Text.ToLower = "true" Then
            MyLog.WriteEntry("The daily timer is set to expire in " &
IntervalToReturn & " milliseconds, which is " & IntervalToReturn / 1000 / 60
/ 60 & " hours.", EventLogEntryType.Information)
        End If

        Return IntervalToReturn
    End Function

Author
15 Feb 2006 6:22 PM
tomb
This sounds like a very complex app - and sounds nicely done.  The
difficulty with something like this is that you are relying on an
outside source to provide a file.  Scroll down.

Trevor wrote:

Show quoteHide quote
>This is driving us mad - please help!
>
>Back in 2003, I coded a Windows Service in VB.NET for framework v1.1.4322.
>I deployed it in Nov. 2003, and it worked fine until the end of May 2005,
>when it choked (see below).  We restarted it, and it worked fine for another
>7 months until it choked again at the end of Dec. 2005.  It has now failed
>again (mid Feb. 2006).  So it's failing more frequently now?
>
>OnStart, the service reads a number of settings from the configuration file.
>Among thsoe settings there is:
>- a path for a FileSystemWatcher
>- a time when the file is expected to arrive, used to set a timer
>
>FileSystemWatcher is set to watch the specified path for the creation
>(arrival) of a *.CSV file from another system.  When it arrives, it's
>contents are read and loaded into a database.  In return, other data is
>collected from the database and put in an output file (*.OUT).  Then, a
>global variable called LastExecuted is set to Now().  Declaration is at the
>top of the code:  Private LastExecuted As Date.
>
>If the time value is 02:00, then we expect file arrival at 2:00AM.
>Therefore, the timer is set at the proper millissecond interval between now
>and the next 3:00AM that comes around (whether that's today or tomorrow,
>depending on the current time).  On Timer_Elapsed, I check that LastExecuted
>is within the last 75 minutes.  If not, the file failed to arrive today, so
>make an entry in the EventLog and SmtpMail.Send an error message.  Then,
>re-read the configuration file (in case the expected arrival time was
>changed) and then reset and restart the Timer (which typically results in a
>24-hour period).

>
How often does the file not arrive as expected?  Is the expected arrival
time always modified?  If it is not modified, what is the result within
the application?  Does it affect the time_interval?
This may be a shot in the dark, but it's all I can think of.

Tom
Author
16 Feb 2006 1:35 AM
Stephany Young
You say that once the FileSystemWatcher is triggered, you wait another 20
seconds until FTP finishes transferring the file.

The types of things I would be looking at include:

  - How do you determine if the inwards FTP operation has completed?

  - What size is a 'normal' or inward file and how long does it normally
take to
    transfer?

  - What should happen if the inwards transfer does not complete within
    20 seconds?

  - On the days it 'choked' was the transferred file significantly larger
    than normal?

  - On the days it 'choked' what does the FTP log show?

I get the impression that this is file that normally arrives sometime during
the early hours of the morning and is processed ready for the day ahead. If
this is the case then what is the latest time the file could arrive before
one could assume that is not going to arrive today and how long does the
processing of the file take? Also, is it a requirement that the processing
of the file is finished before a certain time?

If the latest time for arrival was within a relatively short time of the
normal arrival time (say up to 1 hour of 2:00 AM) and the processing time
was relatively short (say 1 hour at the most) I would be inclined to use
task scheduler rather than a service. With the job scheduled for say 4:00 AM
it would be all over by 5:00 AM.
The job would simply check to see if the file exists, process it if it does
and report appropriately if it doesn't.


Show quoteHide quote
"Trevor" <tsides @ intelligentsystemsconsulting.com> wrote in message
news:OfqLqokMGHA.3712@TK2MSFTNGP10.phx.gbl...
> This is driving us mad - please help!
>
> Back in 2003, I coded a Windows Service in VB.NET for framework v1.1.4322.
> I deployed it in Nov. 2003, and it worked fine until the end of May 2005,
> when it choked (see below).  We restarted it, and it worked fine for
> another 7 months until it choked again at the end of Dec. 2005.  It has
> now failed again (mid Feb. 2006).  So it's failing more frequently now?
>
> OnStart, the service reads a number of settings from the configuration
> file. Among thsoe settings there is:
> - a path for a FileSystemWatcher
> - a time when the file is expected to arrive, used to set a timer
>
> FileSystemWatcher is set to watch the specified path for the creation
> (arrival) of a *.CSV file from another system.  When it arrives, it's
> contents are read and loaded into a database.  In return, other data is
> collected from the database and put in an output file (*.OUT).  Then, a
> global variable called LastExecuted is set to Now().  Declaration is at
> the top of the code:  Private LastExecuted As Date.
>
> If the time value is 02:00, then we expect file arrival at 2:00AM.
> Therefore, the timer is set at the proper millissecond interval between
> now and the next 3:00AM that comes around (whether that's today or
> tomorrow, depending on the current time).  On Timer_Elapsed, I check that
> LastExecuted is within the last 75 minutes.  If not, the file failed to
> arrive today, so make an entry in the EventLog and SmtpMail.Send an error
> message.  Then, re-read the configuration file (in case the expected
> arrival time was changed) and then reset and restart the Timer (which
> typically results in a 24-hour period).
>
> When the process chokes, this is what happens:
> - the FileSystemWatcher worked as expected that day.
> - the timer interval suddenly changes from 24 hours to less than a second.
> Therefore, 75 minutes after the file was processed (75 minutes after
> LastExecuted) several EventLog entries and e-mails are generated EVERY
> SECOND.
> - the process fails to update itself from the configuration file, even
> though the call to that function is the next line of code to execute after
> the Smtp.Send
>
> Result:  By 8AM, when people arrive for work, I get a call saying that the
> server has sent them 5000+ error e-mails, and to please make it stop.  I
> reboot the service, and everything is fine.  I even have a debug mode
> which tells me what the timer calculates for its intervals and what that
> transaltes into.  The values are always correct.  Until it's been running
> for a while.  Then the thing chokes.
>
> Did something change in the framework around May 2005?  Am I not doing
> some necessary memory cleanup (I have Dim MyLog As New EventLog in every
> function/sub - do I need to release that)?  The Timer and LastExecuted ar
> global variable used only once, right - there's nothing to clean up there,
> is there?  I don't know if this is relevant, but there is another Timer in
> my code - one that starts when the FileSystemWatcher is triggered, waits
> 20 seconds until FTP is done transferring the file, then stops.  I can't
> image that interfereing - or is it?  There's nothing wrong with my
> interval calculation, is there?
>
>    Private Function ProperTimerInterval() As Double
>        Dim MyLog As New EventLog
>        MyLog.Source = "MyCompany"
>
>        Dim NextCheckTime As Date
>        If CInt(Time.Text.Substring(0, 2)) < CInt(Date.Now.Hour) Then
>            NextCheckTime = Date.Parse(Date.Now.AddDays(1).Date & " " &
> Time.Text)
>        Else
>            NextCheckTime = Date.Parse(Date.Now.Date & " " & Time.Text)
>        End If
>        NextCheckTime = NextCheckTime.AddHours(1)
>
>        Dim IntervalToReturn As Double =
> NextCheckTime.Subtract(Date.Now).TotalMilliseconds
>
>        If Debug.Text.ToLower = "true" Then
>            MyLog.WriteEntry("The daily timer is set to expire in " &
> IntervalToReturn & " milliseconds, which is " & IntervalToReturn / 1000 /
> 60 / 60 & " hours.", EventLogEntryType.Information)
>        End If
>
>        Return IntervalToReturn
>    End Function
>
Author
16 Feb 2006 4:57 PM
Rick
My guess is that you have an overflow error in a variable used in the
timer. WinNT used to have a bug where it would crash after running for
something like a week. MS used an integer to track the number of
seconds since the last reboot and it would overflow in about a week. If
you have something that tracks the milliseconds since startup, you may
be facing a similar issue.