[Macchiato] Network driver crash

Sat Mar 3 18:21:37 GMT 2018

Hey folks,

So I've dug into this a bit more. It's always reproducible for me when
I'm logged in via ssh and sit in a loop grabbing a large file using
wget. For me it's always after a flood of warnings:

...
[ 1107.762610] mvpp2 f4000000.ethernet eth2: wrong cpu on the end of Tx processing                     
[ 1107.962695] mvpp2 f4000000.ethernet eth2: wrong cpu on the end of Tx processing                     
[ 1108.162688] mvpp2 f4000000.ethernet eth2: wrong cpu on the end of Tx processing                     
[ 1108.362726] mvpp2 f4000000.ethernet eth2: wrong cpu on the end of Tx processing                     
[ 1108.562789] mvpp2 f4000000.ethernet eth2: wrong cpu on the end of Tx processing                     
[ 1108.572577] mvpp2 f4000000.ethernet eth2: wrong cpu on the end of Tx processing                     
[ 1108.579949] Unable to handle kernel paging request at virtual address e0000010f                     
[ 1108.587296] Mem abort info:                                                                         
[ 1108.590102]   ESR = 0x96000004                                                                      
[ 1108.593177]   Exception class = DABT (current EL), IL = 32 bits                                     
[ 1108.599121]   SET = 0, FnV = 0                                                                      
[ 1108.602191]   EA = 0, S1PTW = 0                                                                     
[ 1108.605348] Data abort info:                                                                        
[ 1108.608245]   ISV = 0, ISS = 0x00000004                                                             
[ 1108.612099]   CM = 0, WnR = 0                                                                       
[ 1108.615087] user pgtable: 4k pages, 48-bit VAs, pgdp = 000000004167bcd2                             
[ 1108.621729] [0000000e0000010f] pgd=0000000000000000                                                 
[ 1108.626640] Internal error: Oops: 96000004 [#1] SMP                                                 
[ 1108.631539] Modules linked in: nls_ascii(E) nls_cp437(E) vfat(E) fat(E) mcs7830(E) usbnet(E) mii(E))
[ 1108.690168] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G            E    4.16.0-rc3+ #17                
[ 1108.697944] Hardware name: Marvell Armada 8040 MacchiatoBin/Armada 8040 MacchiatoBin, BIOS EDK II N7
[ 1108.707900] pstate: 40000005 (nZcv daif -PAN -UAO)                                                  
[ 1108.712719] pc : consume_skb+0x1c/0xd0                                                              
[ 1108.716485] lr : __dev_kfree_skb_any+0x58/0x68                                                      
[ 1108.720946] sp : ffff00000801bc10                                                                   
[ 1108.724273] x29: ffff00000801bc10 x28: ffffa315e86a6900                                             
[ 1108.729611] x27: ffffa315ec2e0000 x26: 0000000000000001                                             
[ 1108.734948] x25: ffff18ba30ff8130 x24: ffff18ba30ff8130                                             
[ 1108.740285] x23: ffffa315e86a6950 x22: 0000000000000018                                             
[ 1108.745621] x21: 000000000000000b x20: 0000000000000001                                             
[ 1108.750957] x19: 0000000e0000002b x18: ffffffffffffffff                                             
[ 1108.756293] x17: 0000ffffab4efe18 x16: ffff18ba30e80190                                             
[ 1108.761619] x15: ffff18ba31669b88 x14: ffff18bab17c8bbf                                             
[ 1108.766955] x13: ffff18ba317c8bcd x12: 0000000000000000                                             
[ 1108.772291] x11: 00000000000001fa x10: 0000000000000053                                             
[ 1108.777627] x9 : ffff18ba30d3bf18 x8 : 6f72702078542066                                             
[ 1108.782963] x7 : ffff18ba3135a010 x6 : 00000000bb810000                                             
[ 1108.788300] x5 : ffff18ba31800798 x4 : 0000000000000000                                             
[ 1108.793635] x3 : 0000000000000001 x2 : 0000000e0000002b                                             
[ 1108.798971] x1 : 0000000000000001 x0 : ffff18ba30e801e8                                             
[ 1108.804309] Process swapper/3 (pid: 0, stack limit = 0x00000000c1096792)                            
[ 1108.811038] Call trace:                                                                             
[ 1108.813495]  consume_skb+0x1c/0xd0                                                                  
[ 1108.816911]  __dev_kfree_skb_any+0x58/0x68                                                          
[ 1108.821038]  mvpp2_txq_bufs_free.isra.52+0x8c/0x118 [mvpp2]                                         
[ 1108.826642]  mvpp2_txq_done.isra.67+0xb8/0xf8 [mvpp2]                                               

It's clearly in the TX code rather than the RX code. Looking at the
driver, there's a set of per-cpu transmit queues and that warning
about "wrong cpu" looks quite worrying.

Naive question for Marcin: if the driver has got the wrong CPU when
cleaning up in the per-cpu queues, isn't that a major problem? I'm
assuming that the point of doing this with separate queues is to avoid
the cost of locking, so if you've got one CPU working on another's
queue this is not going to go well...

-- 
Steve McIntyre, Cambridge, UK.                                steve at einval.com
  Mature Sporty Personal
  More Innovation More Adult
  A Man in Dandism
  Powered Midship Specialty