Sponsors
Sponsor Products
Are you using CentOS 7.2 or 7.3?
posted by Greg Whynott  on Oct. 4, 2017, 4:20 p.m. (1 month, 19 days ago)
3 Responses     0 Plus One's     0 Comments  
We are seeing a freezing issue when the firefox browser starts and in dolphin when detail view is invoked. Anyone else?


What we are seeing is a few scenarios where dolphin or firefox will freeze, dragging the window around leaves those trails all over the place. About 10-15 seconds later everything "lets go" and works perfectly for the rest of the day, unless...
You leave firefox idle for an hour or so (out for lunch for example). Then it does the same thing, freeze for up to 20 seconds.
we straced' the process and while it was stuck:
strace -v -tt -e all -p 13712

11:10:27.061893 mprotect(0x325b85f18000, 8192, PROT_READ|PROT_EXEC) = 0
11:10:27.061985 mprotect(0x325b85f19000, 4096, PROT_READ|PROT_WRITE) = 0
11:10:27.062022 mprotect(0x325b85f19000, 4096, PROT_READ|PROT_EXEC) = 0
11:10:27.062082 mprotect(0x325b85f19000, 4096, PROT_READ|PROT_WRITE) = 0
11:10:27.062117 mprotect(0x325b85f19000, 4096, PROT_READ|PROT_EXEC) = 0
11:10:27.062233 mprotect(0x325b85f19000, 4096, PROT_READ|PROT_WRITE) = 0
11:10:27.062276 mprotect(0x325b85f19000, 4096, PROT_READ|PROT_EXEC) = 0
11:10:27.062433 mprotect(0x325b85f19000, 8192, PROT_READ|PROT_WRITE) = 0
11:10:27.062483 mprotect(0x325b85f19000, 8192, PROT_READ|PROT_EXEC) = 0
11:10:27.062960 poll([{fd=4, events=POLLIN|POLLOUT}], 1, 4294967295) = 1 ([{fd=4, revents=POLLOUT}])
11:10:27.062998 writev(4, [{"(\0\4\0\\\0\340\0044\2\0\0\0\0\0\0", 16}, {NULL, 0}, {"", 0}], 3) = 16
11:10:27.063039 poll([{fd=4, events=POLLIN}], 1, 4294967295) = 1 ([{fd=4, revents=POLLIN}])
11:10:27.063122 recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"\1\1\243R\0\0\0\0\247i\241\0018\4\270\1\0\0\0\0\0\0\0\0\0\0
\0\0\0\0\0\0", 4096}], msg_controllen=0, msg_flags=0}, 0) = 32
11:10:27.063165 recvmsg(4, 0x7ffc82cf9e10, 0) = -1 EAGAIN (Resource temporarily unavailable)
11:10:27.063200 recvmsg(4, 0x7ffc82cf9e10, 0) = -1 EAGAIN (Resource temporarily unavailable)
11:10:27.063280 fcntl(85, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=652, len=1}


fd 85 is a file open in an NFS homedir: /path/~/.cache/event-sound-cache.tdb.dh095.x86_64-redhat-linux-gnu


----------o0o0o0o0o0o0o0o00--------------------
I installed kernel version 4 on the machine and it *appears* to make things better. I'm not seeing the stalls today. Will deploy to a few TDs for further testing. BUT -- that is not a solution for us. We can't upgrade all our centos 7 machines to version 4.x when they don't 'support' it. Lord knows what grief it'll bring when a required update is released for something else.

We also moved the home dir to the another NFS server, then to the local machine, no change it appears.

and lastly....
This bug was forwarded to me but I don't think it is related:

https://bugzilla.redhat.com/show_bug.cgi?id=917848


I say this as I tried the following and dolphin displays the directory without issue (there is a pause but it seems natural for the amount of files):

mkdir test;cd test

touch test{000001..999999}.txt
Moving into this directory with dolphin/firefox does not cause the freezing we are seeing a lot of complaints about.


as usual, thanks!
greg






Thread Tags:
  discuss-at-studiosysadmins 

Response from Greg Whynott @ Oct. 17, 2017, 5:50 p.m.
Forgot to send an update... marked as RESOLVED. :)


I'm embarrassed. (again!)

All those odd NFS issues we were seeing I believe are related to the firewall policy I had pushed to the new workstations, which happen to all be 7.3 machines. This is where the "its only happening on the centos machines. It must be OS dist related" came from ...
We are in the midst of migrating from SL 6.4 to CENT 7.3 and along the way of defining the kickstart, I thought it would be a good idea to implement end point firewall policy ( partner studios had access to parts of our networks ),


In my own defense I will say it wasn't obvious at first as doing an 'ls' or poking around in your home dir works as well as expected. Guessing maybe it was lockd having the problem, which I will confirm now that I've found the "fix".
While kernel 4.x fixed a dolphin issue we were seeing when descending into a directory with 1000's of files would lock it up for a while, it did not address the slow firefox/libre office issues.. Disabling the firewall did.

icmp-port-unreachable....... seen that when I did a packet trace on the NFS server side, soon as I did I was like "no f'n way......".


greg





On Thu, Oct 5, 2017 at 11:10 AM, greg whynott <greg.whynott@gmail.com> wrote:
Thanks Michael,
We are on 7.3, there are a few 7.2 machines in the environment but we are seeing the issue on both. The issue does/did not happen on our SL 6.4 machines we were using previously to moving to Centos 7.3.
Today I was provided a new strace dump and it appears to be NFS to the Isilon related, but this confuses me as we moved a home dir local for a test a few weeks ago and was told the problem still existed. I'll be re-visiting that today.
I talked to the decision makers pipe line related, and they do not want to go to 7.4 yet.
my plan for the day/week:
- move home dir local again and have them test.- if the problem goes away then we can look at the network or the NFS server. I'll try moving the home directory off the Isilon onto the HDS and see what that gets us.
I've tomorrow off, so have a good weekend!
greg



On Wed, Oct 4, 2017 at 6:11 PM, Michael Rochefort <mike@michaelrochefort.com> wrote:
Hi Greg,
Are you using 7.2? I personally don't use KDE, but I know multiple people who do that have never seen that issue. The 7.3 branch has been rock solid for me personally and my machines. Would it be possible to try a different browser to test if it's specific to FF?

I could be talking out of my ass here as I don't really know the specifics, but if upgrading your kernel supposedly fixed it then maybe it's a scheduler issue? Again, not positive on that statement.
I can't really recommend it yet, but 7.4 backported a lot of things I think from the 4.x branch. I ran tests previously on 7.3 and 4 seemed to lead the stock kernel in performance by a small margin. The kernel in 7.4 leveled the playing field. Maybe they backported the features that fixes your problem? I would recommend updating to 7.3 though if you haven't (at least on a few machines). All of the apps I run (which is probably less than you do) work phenomenally.
Cheers,Mike

---- On Wed, 04 Oct 2017 16:18:15 -0400 greg whynott<greg.whynott@gmail.com> wrote ----
We are seeing a freezing issue when the firefox browser starts and in dolphin when detail view is invoked. Anyone else?


What we are seeing is a few scenarios where dolphin or firefox will freeze, dragging the window around leaves those trails all over the place. About 10-15 seconds later everything "lets go" and works perfectly for the rest of the day, unless...
You leave firefox idle for an hour or so (out for lunch for example). Then it does the same thing, freeze for up to 20 seconds.
we straced' the process and while it was stuck:
strace -v -tt -e all -p 13712

11:10:27.061893 mprotect(0x325b85f18000, 8192, PROT_READ|PROT_EXEC) = 0
11:10:27.061985 mprotect(0x325b85f19000, 4096, PROT_READ|PROT_WRITE) = 0
11:10:27.062022 mprotect(0x325b85f19000, 4096, PROT_READ|PROT_EXEC) = 0
11:10:27.062082 mprotect(0x325b85f19000, 4096, PROT_READ|PROT_WRITE) = 0
11:10:27.062117 mprotect(0x325b85f19000, 4096, PROT_READ|PROT_EXEC) = 0
11:10:27.062233 mprotect(0x325b85f19000, 4096, PROT_READ|PROT_WRITE) = 0
11:10:27.062276 mprotect(0x325b85f19000, 4096, PROT_READ|PROT_EXEC) = 0
11:10:27.062433 mprotect(0x325b85f19000, 8192, PROT_READ|PROT_WRITE) = 0
11:10:27.062483 mprotect(0x325b85f19000, 8192, PROT_READ|PROT_EXEC) = 0
11:10:27.062960 poll([{fd=4, events=POLLIN|POLLOUT}], 1, 4294967295) = 1 ([{fd=4, revents=POLLOUT}])
11:10:27.062998 writev(4, [{"( \ 4 ", 16}, {NULL, 0}, {"", 0}], 3) = 16
11:10:27.063039 poll([{fd=4, events=POLLIN}], 1, 4294967295) = 1 ([{fd=4, revents=POLLIN}])
11:10:27.063122 recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{" Ri 8
", 4096}], msg_controllen=0, msg_flags=0}, 0) = 32
11:10:27.063165 recvmsg(4, 0x7ffc82cf9e10, 0) = -1 EAGAIN (Resource temporarily unavailable)
11:10:27.063200 recvmsg(4, 0x7ffc82cf9e10, 0) = -1 EAGAIN (Resource temporarily unavailable)
11:10:27.063280 fcntl(85, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=652, len=1}


fd 85 is a file open in an NFS homedir: /path/~/.cache/event-sound-cache.tdb.dh095.x86_64-redhat-linux-gnu


----------o0o0o0o0o0o0o0o00--------------------
I installed kernel version 4 on the machine and it *appears* to make things better. I'm not seeing the stalls today. Will deploy to a few TDs for further testing. BUT -- that is not a solution for us. We can't upgrade all our centos 7 machines to version 4.x when they don't 'support' it. Lord knows what grief it'll bring when a required update is released for something else.

We also moved the home dir to the another NFS server, then to the local machine, no change it appears.

and lastly....
This bug was forwarded to me but I don't think it is related:

https://bugzilla.redhat.com/show_bug.cgi?id=917848


I say this as I tried the following and dolphin displays the directory without issue (there is a pause but it seems natural for the amount of files):

mkdir test;cd test

touch test{000001..999999}.txt
Moving into this directory with dolphin/firefox does not cause the freezing we are seeing a lot of complaints about.


as usual, thanks!
greg





To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe



0 Plus One's     0 Comments  
   

Response from Greg Whynott @ Oct. 5, 2017, 11:15 a.m.
Thanks Michael,
We are on 7.3, there are a few 7.2 machines in the environment but we are seeing the issue on both. The issue does/did not happen on our SL 6.4 machines we were using previously to moving to Centos 7.3.
Today I was provided a new strace dump and it appears to be NFS to the Isilon related, but this confuses me as we moved a home dir local for a test a few weeks ago and was told the problem still existed. I'll be re-visiting that today.
I talked to the decision makers pipe line related, and they do not want to go to 7.4 yet.
my plan for the day/week:
- move home dir local again and have them test.- if the problem goes away then we can look at the network or the NFS server. I'll try moving the home directory off the Isilon onto the HDS and see what that gets us.
I've tomorrow off, so have a good weekend!
greg



On Wed, Oct 4, 2017 at 6:11 PM, Michael Rochefort <mike@michaelrochefort.com> wrote:
Hi Greg,
Are you using 7.2? I personally don't use KDE, but I know multiple people who do that have never seen that issue. The 7.3 branch has been rock solid for me personally and my machines. Would it be possible to try a different browser to test if it's specific to FF?

I could be talking out of my ass here as I don't really know the specifics, but if upgrading your kernel supposedly fixed it then maybe it's a scheduler issue? Again, not positive on that statement.
I can't really recommend it yet, but 7.4 backported a lot of things I think from the 4.x branch. I ran tests previously on 7.3 and 4 seemed to lead the stock kernel in performance by a small margin. The kernel in 7.4 leveled the playing field. Maybe they backported the features that fixes your problem? I would recommend updating to 7.3 though if you haven't (at least on a few machines). All of the apps I run (which is probably less than you do) work phenomenally.
Cheers,Mike

---- On Wed, 04 Oct 2017 16:18:15 -0400 greg whynott<greg.whynott@gmail.com> wrote ----
We are seeing a freezing issue when the firefox browser starts and in dolphin when detail view is invoked. Anyone else?


What we are seeing is a few scenarios where dolphin or firefox will freeze, dragging the window around leaves those trails all over the place. About 10-15 seconds later everything "lets go" and works perfectly for the rest of the day, unless...
You leave firefox idle for an hour or so (out for lunch for example). Then it does the same thing, freeze for up to 20 seconds.
we straced' the process and while it was stuck:
strace -v -tt -e all -p 13712

11:10:27.061893 mprotect(0x325b85f18000, 8192, PROT_READ|PROT_EXEC) = 0
11:10:27.061985 mprotect(0x325b85f19000, 4096, PROT_READ|PROT_WRITE) = 0
11:10:27.062022 mprotect(0x325b85f19000, 4096, PROT_READ|PROT_EXEC) = 0
11:10:27.062082 mprotect(0x325b85f19000, 4096, PROT_READ|PROT_WRITE) = 0
11:10:27.062117 mprotect(0x325b85f19000, 4096, PROT_READ|PROT_EXEC) = 0
11:10:27.062233 mprotect(0x325b85f19000, 4096, PROT_READ|PROT_WRITE) = 0
11:10:27.062276 mprotect(0x325b85f19000, 4096, PROT_READ|PROT_EXEC) = 0
11:10:27.062433 mprotect(0x325b85f19000, 8192, PROT_READ|PROT_WRITE) = 0
11:10:27.062483 mprotect(0x325b85f19000, 8192, PROT_READ|PROT_EXEC) = 0
11:10:27.062960 poll([{fd=4, events=POLLIN|POLLOUT}], 1, 4294967295) = 1 ([{fd=4, revents=POLLOUT}])
11:10:27.062998 writev(4, [{"( \ 4 ", 16}, {NULL, 0}, {"", 0}], 3) = 16
11:10:27.063039 poll([{fd=4, events=POLLIN}], 1, 4294967295) = 1 ([{fd=4, revents=POLLIN}])
11:10:27.063122 recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{" Ri 8
", 4096}], msg_controllen=0, msg_flags=0}, 0) = 32
11:10:27.063165 recvmsg(4, 0x7ffc82cf9e10, 0) = -1 EAGAIN (Resource temporarily unavailable)
11:10:27.063200 recvmsg(4, 0x7ffc82cf9e10, 0) = -1 EAGAIN (Resource temporarily unavailable)
11:10:27.063280 fcntl(85, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=652, len=1}


fd 85 is a file open in an NFS homedir: /path/~/.cache/event-sound-cache.tdb.dh095.x86_64-redhat-linux-gnu


----------o0o0o0o0o0o0o0o00--------------------
I installed kernel version 4 on the machine and it *appears* to make things better. I'm not seeing the stalls today. Will deploy to a few TDs for further testing. BUT -- that is not a solution for us. We can't upgrade all our centos 7 machines to version 4.x when they don't 'support' it. Lord knows what grief it'll bring when a required update is released for something else.

We also moved the home dir to the another NFS server, then to the local machine, no change it appears.

and lastly....
This bug was forwarded to me but I don't think it is related:

https://bugzilla.redhat.com/show_bug.cgi?id=917848


I say this as I tried the following and dolphin displays the directory without issue (there is a pause but it seems natural for the amount of files):

mkdir test;cd test

touch test{000001..999999}.txt
Moving into this directory with dolphin/firefox does not cause the freezing we are seeing a lot of complaints about.


as usual, thanks!
greg





To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe


0 Plus One's     0 Comments  
   

Response from Mike Rochefort @ Oct. 4, 2017, 6:15 p.m.
Hi Greg,
Are you using 7.2? I personally don't use KDE, but I know multiple people who do that have never seen that issue. The 7.3 branch has been rock solid for me personally and my machines. Would it be possible to try a different browser to test if it's specific to FF?

I could be talking out of my ass here as I don't really know the specifics, but if upgrading your kernel supposedly fixed it then maybe it's a scheduler issue? Again, not positive on that statement. 
I can't really recommend it yet, but 7.4 backported a lot of things I think from the 4.x branch. I ran tests previously on 7.3 and 4 seemed to lead the stock kernel in performance by a small margin. The kernel in 7.4 leveled the playing field. Maybe they backported the features that fixes your problem? I would recommend updating to 7.3 though if you haven't (at least on a few machines). All of the apps I run (which is probably less than you do) work phenomenally.
Cheers,Mike

---- On Wed, 04 Oct 2017 16:18:15 -0400 greg whynott<greg.whynott@gmail.com> wrote ----
We are seeing a freezing issue when the firefox browser starts and in dolphin when detail view is invoked.  Anyone else? 


What we are seeing is a few scenarios where dolphin or firefox will freeze,  dragging the window around leaves those trails all over the place.  About 10-15 seconds later everything "lets go" and works perfectly for the rest of the day,  unless...
You leave firefox idle for an hour or so (out for lunch for example).  Then it does the same thing,  freeze for up to 20 seconds.
we straced' the process and while it was stuck:
strace -v -tt -e all -p 13712

11:10:27.061893 mprotect(0x325b85f18000, 8192, PROT_READ|PROT_EXEC) = 0
11:10:27.061985 mprotect(0x325b85f19000, 4096, PROT_READ|PROT_WRITE) = 0
11:10:27.062022 mprotect(0x325b85f19000, 4096, PROT_READ|PROT_EXEC) = 0
11:10:27.062082 mprotect(0x325b85f19000, 4096, PROT_READ|PROT_WRITE) = 0
11:10:27.062117 mprotect(0x325b85f19000, 4096, PROT_READ|PROT_EXEC) = 0
11:10:27.062233 mprotect(0x325b85f19000, 4096, PROT_READ|PROT_WRITE) = 0
11:10:27.062276 mprotect(0x325b85f19000, 4096, PROT_READ|PROT_EXEC) = 0
11:10:27.062433 mprotect(0x325b85f19000, 8192, PROT_READ|PROT_WRITE) = 0
11:10:27.062483 mprotect(0x325b85f19000, 8192, PROT_READ|PROT_EXEC) = 0
11:10:27.062960 poll([{fd=4, events=POLLIN|POLLOUT}], 1, 4294967295) = 1 ([{fd=4, revents=POLLOUT}])
11:10:27.062998 writev(4, [{"(\4", 16}, {NULL, 0}, {"", 0}], 3) = 16
11:10:27.063039 poll([{fd=4, events=POLLIN}], 1, 4294967295) = 1 ([{fd=4, revents=POLLIN}])
11:10:27.063122 recvmsg(4, {msg_name(0)=NULL, msg_iov(1)=[{"Ri8
", 4096}], msg_controllen=0, msg_flags=0}, 0) = 32
11:10:27.063165 recvmsg(4, 0x7ffc82cf9e10, 0) = -1 EAGAIN (Resource temporarily unavailable)
11:10:27.063200 recvmsg(4, 0x7ffc82cf9e10, 0) = -1 EAGAIN (Resource temporarily unavailable)
11:10:27.063280 fcntl(85, F_SETLKW, {type=F_RDLCK, whence=SEEK_SET, start=652, len=1}


fd 85 is a file open in an NFS homedir:   /path/~/.cache/event-sound-cache.tdb.dh095.x86_64-redhat-linux-gnu 


----------o0o0o0o0o0o0o0o00--------------------
I installed kernel version 4 on the machine and it *appears* to make things better.   I'm not seeing the stalls today.  Will deploy to a few TDs for further testing.  BUT --  that is not a solution for us.   We can't upgrade all our centos 7 machines to version 4.x when they don't 'support' it.  Lord knows what grief it'll bring when a required update is released for something else.

We also moved the home dir to the another NFS server,  then to the local machine,  no change it appears.

and lastly....
This bug was forwarded to me but I don't think it is related:

https://bugzilla.redhat.com/show_bug.cgi?id=917848


I say this as I tried the following and dolphin displays the directory without issue (there is a pause but it seems natural for the amount of files):

mkdir test;cd test

touch test{000001..999999}.txt
Moving into this directory with dolphin/firefox does not cause the freezing we are seeing a lot of complaints about.


as usual,  thanks!
greg





To unsubscribe from the list send a blank e-mail to mailto:studiosysadmins-discuss-request@studiosysadmins.com?subject=unsubscribe

0 Plus One's     0 Comments